1 Introduction

During the month of December 2019, the first case of Severe Acute Respiratory Syndrome (SARS)-cov-2 virus infected person was diagnosed with respiratory failure, resulting in a global outbreak of the pandemic COVID-19. The viral infection was observed to be viral phenomena that targeted the respiratory organs of the human body [1]. By March 2020 the disease had spread throughout the world resulting in the declaration of pandemic by the World Health Organisation (WHO) [2]. Using thermal scanners we can detect the people with fever but cannot detect people with coronavirus. The usage of Ultra Violet (UV) lamps to sterilize hands or other areas of skin can cause skin irritation [3]. The quickest and most efficient manner of handling this pandemic is the use of imaging with biological and clinical comorbidities hacking and quantification of patients based on hospitalization. One of the crucial aspects of the system is to identify patients who need intubation on an urgent basis to verify and manage with the hospital's resources [4]. This gives a positive means of managing the patients as well as the available equipment in order to meet the demand of the outburst. Similarly robust staging will also help diagnose the patients in an accurate manner leading to two different treatments depending on the patient's condition thereby decreasing the unnecessary utilisation of Intensive Care Units (ICUs) in the hospital [5].

In the current scenario, the diagnosis of the patients is primarily based on biological and clinical biomarkers such as gender, age, comorbidities etc. The primary focus of imaging deals with progress of the disease in the patients based on the CT scan images [6]. However as medical experts diagnose the images and intra- and inter- observer variability will play a major drawback. January 2020 for significant increase in COVID-19 affected patients and by March 2020 over a million people were infected, worldwide. The COVID-19 virus in general harbours with no symptoms or minimal symptoms and causes severe pneumonia in about 10% of the patients [7]. Acute respiratory distress syndrome is caused by the COVID-19 virus leading to severe damage or impairment of the lungs. A lab Reverse Transcription–Polymerase Chain Reaction (RT-PCR) test is performed to confirm the presence of SARS virus. However there are a number of challenges faced such as sensitivity, variability’s in test techniques, delay in processing and high false negative rates. The sudden outbreak of COVID-19 pandemic shook the hospital infrastructure across the state, resulting in an overload of patients with limited hospital amenities [8].

The hospitals also require investment in resources to identify the virus, to decrease the burden on the doctors. In [9], the authors have established a viable correlation between the health-care burden and the mortality of COVID 19. It is observed that radiologists, clinicians and doctors faced a lot of stress and exhaustion due to the increased number of patients who required timely treatment and diagnosis. However, taking into consideration the large population globally [10], it is not possible to train the doctors in diagnosing and treating the disease within a short span of time, especially considering the regions where outbreak was yet to affect. Hence in order to tackle this criteria, we have proposed the deployment and development of an AI-based methodology that serves to detect the presence of coronavirus with the help of CT images. Doctors and physicians in the hospitals currently use the Hospital information System (HIS) to determine the ID of the patient to access the CT images and provide an appropriate diagnosis. It is crucial that the patients with high risk are diagnosed and treated immediately as the COVID-19 variants and their presence in affected patients increases [11]. However, the CT images captured are stored based on the time at which they are captured. Hence it will be difficult to identify the ID of highly affected patients and give them immediate attention.

Addressing these discrepancies, this paper introduces an AI system that provides diagnoses based on infection probability based on their ranking IDs. The AI is trained to classify and segment the proposed system, resulting in saving 25–30% detection time, thereby improving the rate at which the coronavirus is detected [12]. The suspected area is identified by the AI classification subsystem while the probability of COVID-19 is provided by the classification subsystem. The drawback with AI is the need for samples in order to train them. However, this database was not available at the beginning of the pandemic due to limited knowledge on the coronavirus and its impact on the human body. Now, there is a wide dataset available to the public on the various affected cases, their conditions and the treatments used. Accordingly, a total of 200 training cases are considered, of which 150 were identified to be infected from random hospitals [13]. The imaging data that is taken into consideration is verified by the Nucleic Acid Test (NAT). Experienced annotators are used to annotate the samples that have diagnostic characteristics.

The biggest challenge faced during the training is the distinguishing of COVID-19 from other pulmonary diseases. This is because, while it is easy to distinguish healthy cases from pneumonia affected cases, it is not that easy to distinguish COVID-19 affected and pulmonary infection patients. Hence the dataset will also comprise of other pulmonary diseases that need to be distinguished from COVID-19 [14]. With the help of the available dataset, training and evaluation of deep learning methodologies to segment and detect the COVID-19 regions is possible. There are four stages in which the AI model is constructed including collection of data, annotation of data, training and evaluation model and deployment model [15].

The following are the contributions made by this paper:

  • An AI system is built and deployed to automatically diagnose the CT images to identify patients affected by COVID-19.

  • The infected and contour regions are labelled and built as a new dataset to recognise and observe the changes in COVID-19.

  • A total of 200 images are processed using the hybrid PSO-SVM AI environment to study its effectiveness.

  • The workload of physicians will be reduced drastically to face the outbreaks occurring in various parts of the state.

  • The proposed model can distinguish between various pulmonary conditions along with classification and detection of lung condition.

  • This is achieved by a combination of data pre-processing, ROI extraction, Feature selection and Principal component Analysis in the algorithm.

  • The proposed model shows an improved accuracy of up to 95.78% in classification of healthy, pulmonary embolism, pulmonary edema, pneumonia, lung cancer, Chronic obstructive pulmonary disease (COPD), COVID-19, Pneumothorax and asthmatic lung conditions.

2 Related work

In recent years, a number of research has been carried out to detect and classify the diseases that are lung-related with the help of Artificial Intelligence. In [16], a non-invasive classification method is described with the help of an electronic stethoscope which can detect the recorded respiratory sounds. A total of 1539 subjects were observed and 21,865 lung sounds were recorded. The output indicated that with the help of a support vector machine and convolutional neural network it was possible to accurately classify the respiratory sounds. Similarly in [17], a methodology for determining the various cough sounds using transformation features leading to accurate diagnosis of pulmonary infections. On the other hand, the authors in [18] have designed a methodology to identify the irregular pattern of the diseases and define a pattern to segregate them accordingly. A smartphone was used to capture 255 breath cycles to determine the irregularities in respiratory diseases. Using SVM, the authors were able to attain an accuracy of 72% based on the respiratory sounds and inspiratory cycles [19].

In order to detect and diagnose respiratory diseases and illness, a number of AI methodologies have been developed over the years as shown in [20]. A platform known as FluSense is used in waiting areas to interact with the patients in a safe manner. This technology uses bio-clinical signals that are connected to symptoms indicating illness caused by influenza and pneumonia and are placed in the hospitals. This device will have a neural computing engine, a microphone array and a thermal camera which characterizes the change in cough sound of individuals in real-time [20]. The device was placed in a hospital of a university for a period of 7 months during which time the performance of the device was tested. As the number of cases affected by the coronavirus increases day-by-day, accurate and early testing plays a crucial role in curing or curbing the death rate. Lack of early diagnosis and testing have been identified to be the common cause of speed of COVID-19 and could have been mainly due to confusion with similar diseases or inaccurate testing.

The two major mechanisms that could be used to identify COVID -19 infected patients was identified [21]. One method is the testing of blood while the other method involves analysis of the CT image. Dry cough tiredness and persistent fever are some of the common symptoms observed in COVID-19 patients. However there is limited research on this sudden pandemic and researches have contributed towards prediction and diagnosis of COVID-19 using a number of novel methodologies including AI-based software. An engine to identify the presence of corona virus using high resolution images from CT scan is developed in [22]. Based on the observations, it was predicted that high risk of bias in reporting COVID-19 diagnostics and the urgent need to use other diagnostic methods which proved to be time consuming and costly [23]. Hence in order to increase the test capacity a number of countries designed and developed cheaper rapid tests. The drawback with using these cheap rapid tests is that the case of false negatives and false positives is pre-dominant and cannot be trusted. Hence a number of researchers contributed towards searching for alternative solutions. Smartphone sensors is a common field where researchers have put in ample effort and is yet in the conceptualization phase. Table 1 provides a comparative analysis of certain existing models and evaluation metrices for detection of COVID-19.

Table 1 COVID-19 examination models

3 Proposed work

The subjects affected by COVID-19 are classified using the proposed hybrid PSO-SVM framework as shown in Fig. 1. The Magnetic Resonance Imaging (MRI) images are pre-processed and the features are extracted from the images. The images undergo principal component analysis and the hybrid PSO-SVM model. Smoothing, normalization, segmentation, pipelining and other pre-processing techniques are used on the MRI images and the redundant information is filtered. Normalization is essential as the slice thickness and resolution of the samples vary from each other. Region of Interest (ROI) extraction and feature selection [29] is performed in the processed data. Further, the data analysis process is simplified by reduction of dimension using the principal component analysis (PCA) approach. The proposed hybrid PSO-SVM model is introduced and the images are classified into subjects with healthy, pulmonary embolism, pulmonary edema, pneumonia, lung cancer, COPD, COVID-19, Pneumothorax and asthmatic lung conditions. Structural risk minimization and statistical theory principles are used for developing the supervised learning algorithm called SVM. Here, the interval between two data samples are maximized by establishing a hyper plane in a high dimensional space. The global search ability of PSO algorithms are enhanced largely using several variations in the model over the years [30].

Fig. 1
figure 1

Hybrid PSO-SVM Framework

Figure 2 provides the hybrid PSO-SVM algorithm. Initially, the search range of the SVM parameters are set. The modified PSO parameters are then initialized. Then the fitness values are evaluated using the formula

$$ fitness = \frac{True\, Positive + True \,Negative}{{True \,Positive + False \,Positive + False \,Negative + True\, Negative}}. $$
(1)
Fig. 2
figure 2

Hybrid PSO-SVM Algorithm

Random initial parameters are used for training the model. For each particle, the evolutionary factor is calculated based on the ratio of distance between the particles and mean distance of the particle using the expression

$$ E_{f} = \frac{Mean\, distance - Minimum\, distance}{{Maximum \,distance - minimum \,distance}}. $$
(2)

Based on the current state, the next generation state is updated using the following expression

$$ \Pi = \left[ {\begin{array}{*{20}c} \phi & {1 - \phi } & 0 \\ {\frac{1 - \phi }{2}} & \phi & {\frac{1 - \phi }{2}} \\ 0 & {\frac{1 - \phi }{2}} & 0 \\ \end{array} } \right] $$
(3)

where the value of ϕ is set at 0.9. Then, the inertia weight is evaluated using the formula

$$ \omega \left( {E_{f} } \right) = 0.4 E_{f} + 0.3 $$
(4)

The acceleration coefficient, global optimal position and the particle optimal position are selected. Further, the position and velocity is updated based on the inertia weight, acceleration coefficients and time delay constants. If the value of maximum iteration is reached, the SVM with optimized parameters is obtained. Else, the process loops back to fitness evaluation and repeats till the maximum iteration is reached. The proposed algorithm is represented below.

figure a

The subjects with pulmonary issues are classified based on the type of issue using this algorithm with the help of the MRI images. For early and accurate diagnosis of COVID-19, this model offers improved classification. For all classification tasks, the performance of the proposed model is evaluated using a ten-fold cross validation technique. SVM, PSO, Deep Belief Network (DBN), and Stacked Auto-Encoder (SAE) models are compared with the proposed hybrid model to test for efficiency.

4 Results and discussion

In this paper, we use a custom dataset with 200 MRI images obtained from 50 healthy subjects and 150 subjects with pulmonary disorder including COVID-19. Classification task is performed for categorizing the subjects with healthy, pulmonary embolism, pulmonary edema, pneumonia, lung cancer, COPD, COVID-19, Pneumothorax and asthmatic lung conditions. Early diagnosis of COVID-19 is made possible with this technique. Experimental results show that the performance of the proposed framework provides an accuracy of 95.78%. Further, comparison with existing models such as SVM, PSO, DBN, and SAE are also performed. From the comparison results, it is evident that the proposed framework demonstrates superior performance and outperforms the existing variants. Figure 3 represents the classification of the dataset based on the type of lung disease.

Fig. 3
figure 3

Classification of the data based on the type of lung disease

Factors such as Hausdorff distance and dice similarity score are assessed to estimate the automated segmentation quality and similarity between automated and manual segmentation respectively. The total lung volume and the diseased lung volume is calculated to evaluate the extent of disease. Sensitivity, specificity, precision and accuracy parameters that are classic machine learning metrics are used for stratifying the dataset into various categories. For training and validation cohorts, the maximum and minimum values of attributes are calculated to perform normalization. The test set is applied with the same values. Selection of robust biomarkers contribute towards prognosis and staging thereby preventing over fitting. Figure 4 represents the severity analysis graph based on the percentage of lung infection to identify the disease extent.

Fig. 4
figure 4

Percentage of lung infection for severity analysis

A specificity of 0.857 and a sensitivity of 0.956 is observed from the output. The regions with lesions are highlighted using a model pipeline where a combination of segmentation and classification is performed. Jumping-out, exploration, exploitation and convergence are the categories of evolutionary states using the modified PSO algorithm based search process considered in this paper. The user interface and highlighted region of lesions provide insightful information to the physicians using this model prediction. Honeycomb lung syndrome, fiber stripes, crazy-paving pattern, vessel thickening, air bronchogram sign, intralobular septal thickening, and ground glass opacity that are typical characteristics of lesions observed in patients affected with COVID-19 are identified. Table 2 represents the comparison of accuracy while classifying various images using the proposed and existing techniques. Accurate results are obtained using the proposed model when compared to the existing machine learning schemes.

Table 2 Accuracy comparison in percentage for various models

5 Conclusion and future work

A novel hybrid PSO-SVM algorithm is developed in this paper to identify COVID-19 at early stages using MRI images. Pre-processing of images, extraction of features, PCA and classification is performed using the proposed framework for this diagnosis. The proposed model provides an accuracy of 95.78% and outperforms the conventional SVM, PSO, DBN, and SAE models used in this application. A specificity of 0.857 and a sensitivity of 0.956 is observed using the proposed framework. The typical characteristics of lesions observed in patients affected with COVID-19 are identified. Along with this, other lung diseases are also identified and differentiated. With the increase in training data, further improvement in the accuracy performance may be obtained. Future work is directed towards using lab test results, symptoms, patient profile and other clinical data inputs for developing a complete multi-modal model that provides enhanced screening results. Integration of this framework with a SaaS platform and cloud infrastructure will largely benefit the rapidly evolving telemedicine domain.