A machine learning approach to predict the parameters of COVID‐19 severity to improve the diagnosis protocol in Oman

Al Shaqsi, Jamil; Borghan, Mohamed; Drogham, Osama; Al Whahaibi, Salim

doi:10.1007/s42452-023-05495-5

A machine learning approach to predict the parameters of COVID‐19 severity to improve the diagnosis protocol in Oman

Research
Open access
Published: 29 September 2023

Volume 5, article number 273, (2023)
Cite this article

Download PDF

You have full access to this open access article

SN Applied Sciences Aims and scope Submit manuscript

A machine learning approach to predict the parameters of COVID‐19 severity to improve the diagnosis protocol in Oman

Download PDF

Jamil Al Shaqsi¹,
Mohamed Borghan²,
Osama Drogham^3,4 &
…
Salim Al Whahaibi⁵

885 Accesses
1 Citation
Explore all metrics

Abstract

The purpose of this study is to utilize a Machine Learning-based methodology for predicting the key parameters contributing to severe COVID-19 cases among patients in Oman. To carry out the investigation, a comprehensive dataset of patient information, encompassing a range of blood parameters, was acquired from major government hospitals in Oman. Diverse machine learning algorithms were deployed to uncover underlying trends within the acquired dataset. The outcomes of this research delineated the determinants of severe cases into two categories: non-blood-related parameters and blood-related parameters. Among non-blood-related factors, advanced age, gender, and the presence of chronic kidney disease emerged as risk factors contributing to unfavorable prognoses, particularly in elderly patients. In the realm of blood parameters, male patients with blood types O-positive and A-positive exhibited heightened susceptibility to severe illness compared to their female counterparts. Additionally, deviations in Hemoglobin levels, Mean Cell Volume, and Eosinophil counts were identified as drivers of poor prognoses among elderly patients. The implications of these research findings extend to aiding healthcare decision-makers in quantifying the associated risks, health benefits, and cost-effectiveness pertaining to COVID-19. Furthermore, the acquired insights can empower decision-makers to refine the management of COVID-19, expediting treatment protocols and minimizing the risk of mortality. Interestingly, the study unveiled a correlation linking blood type to disease progression. A notable finding indicated that a staggering 96.5% of patients succumbed to the disease even when their blood sodium levels remained in the standard range of 136–145 mmol/L. These insights hold immense value for healthcare institution decision-makers, allowing a more in-depth evaluation of the risks, health benefits, and the cost-effectiveness related to COVID-19. Consequently, the findings offer a guiding light for implementing pivotal measures, optimizing treatment protocols, and substantially reducing mortality risks associated with the virus.

Article Highlights

The study revealed that there is a correlation between blood type and the progression of the disease. The study brought attention to a questionable discovery, which is that a considerable percentage of patients (96.5%) passed away despite their blood sodium levels, falling within the normal range of 136-145 mmol/L.
The outcomes of this research will provide valuable insights for health institutions' decision-makers, enabling them to assess the risks, health advantages, and cost-effectiveness associated with COVID-19. Additionally, the findings can guide decision-makers in implementing essential measures to safeguard patients' lives.

Acute kidney injury in the critically ill: an updated review on pathophysiology and management

Article Open access 02 July 2021

How to use biomarkers of infection or sepsis at the bedside: guide to clinicians

Article 02 January 2023

C-reactive protein and procalcitonin during course of sepsis and septic shock

Article Open access 19 May 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Coronavirus (COVID-19/SARS-CoV-2) is a new type of disease that cause a range of illnesses to human [1, 2]. Symptoms include fever, tiredness, breathing difficulties, dry cough, and severe acute respiratory syndrome (ARDS). In more serious cases, coronavirus could lead to death. As of Feb 2023, more than 755 million positive cases have been reported, with more than 6.8 million deaths [3]. Although most COVID-19 infections were asymptomatic or mild, there were significant numbers of COVID-19 cases that required intensive care, and many were fatal [4]. Severe COVID-19 cases have been associated with several factors such as age, gender, race, and various comorbidities [5].

To efficiently diagnose patients and provide the appropriate treatments, patients’ data is needed [6, 7]. This is because is patients’ data provides essential information about each patient parameters values [8]. This information is crucial in helping specialist to speed up diagnoses and provide a rapid treatment with high accuracy [7]. Patient data can also assist in monitoring the progression of a disease over time. Such information will help in making data-driven decisions about treatment and fine-tuning the treatment accordingly. By analyzing patient data, authorities can identify harmonies among patients and take the necessary actions to minimize further spread of the disease [7]. More importantly, patient data is critical in medical research alike as it is considered as the main factor in the development of treatments and cures.

Rumaling et al. [9] introduced a new method for detecting coronavirus using Raman spectroscopy. The proposed approach focuses on utilizing the unique “biofingerprint” characteristics displayed by the virus to distinguish it from other pathogens. The study used a total of 150 samples divided into two parts. The first part consists of 75 nasal swab samples obtained from individuals who had contracted SARS-CoV-2, indicating positive cases. Whereas the second part consists of the remaining 75 nasal swab samples from individuals who were in good health. By analyzing the Raman spectra of virus samples, scientists can identify specific molecular vibrations that serve as distinctive markers for coronavirus. The study emphasized the benefits of Raman spectroscopy, including its quick analysis, non-destructive nature, and minimal sample preparation requirements. The researchers conducted experiments with different coronavirus strains and obtained promising outcomes in accurately identifying and differentiating them from other viruses. The authors claimed that by utilizing the biofingerprint characteristics of the coronavirus, this technique could contribute to the creation of rapid diagnostic tools for effective disease control and surveillance.

Altan and Karasu [10] aimed to identify patients who are infected with coronavirus pneumonia using X-ray in order to distinguish the pneumonia caused by COVID-19 from community-acquired pneumonia. The total trained dataset contained 2905 cases only. Around 1341 of the cases were normal images, 219 were COVID-19 positive images, and 1345 viral pneumonia images. While the total tested samples were only 581 in which 268 of the dataset was normal images, 44 positive images, and 269 viral pneumonia images. A hybrid model consists of 2D curvelet, transform, CSSA, and EfficientNet-B0 was used in this study. The accuracy, specificity, precision, recall and F-Measure for the hybrid model were considered to be very high which are 99.69%, 99.81%, 99.62%, 99.44% and 99.53%, respectively.

The textual clinical report classified COVID-19 into four classes: COVID Class, SARS, ARDS and both COVID and ARDS [11]. The dataset was obtained from the GitHup repository. It consists of 212 patients and 24 parameters. It is unstructured dataset that represents the clinical information of the patients such as survival, date, extubated,, intubated, modality, temperature, pO2 saturation, needed supplemental O2, leukocyte count, lymphocyte count, neutrophil count, went icu, view, location, filename, folder, DOI and other notes [11]. The data has been analyzed using different machine learning algorithms, which are Vector Machine, Multinomial Naive Bayes, Logistic Regression, Decision Tree, Random Forest, Bagging, Adaboost and Stochastic Gradient Boosting. Around 70% of the dataset has been used for training and 30% of it for testing. In addition, cross-validation test was applied with all the algorithms. Logistic Regression and Multinomial Naive Bayes Algorithms recorded the highest results than other algorithms by having precision 94%, recall 96%, F1 score 95% and accuracy 96.2%.

In Oman, previous reports have shown that obesity, chronic lung disease and chronic kidney disease are risk factors for COVID-19-related intensive care unit admission and death in patients over 60 years old [12]. These specific underlying conditions have been also shown to be associated with severe COVID-19 disease in other countries [13, 14]. These studies and others strongly indicate a link between various underlying health conditions and the susceptibility to negative COVID-19 effects. However, the emergence of highly transmissible new variants has further complicated the landscape of COVID-19. It has been perceived that there is a genetic diversity (different variants) in the strains dispersed in Oman due to the multiplicity of sources of COVID-19 virus, as the infected people came from different countries [15]. Countries around the world got different variants of COVID-19 [16, 17]; hence, the countries respond differently [18]. Analyzing the different variants will help in minimizing the risk of death as some of these variants affect the antibodies and make the immune system ineffective to tackle the viruses [19]. Consequently, some viruses escape from the immune system prior infection and after the vaccination [19]. Understanding the parameters underlying the disease prognosis is critical for patient care and disease management [20]. The WHO confirmed that many of the COVID-19 patients have recovered without any medical intervention [21]. In Contrast, there were cases in which medical treatments were involved especially for old patients and those who suffer from some of the chronic diseases [22, 23]. Thus, there is a need to considered a dataset from Oman to predict the local parameters that cause the death in order to improve diagnosis and treatments of COVID-19 in Oman. This calls for the use of more sophisticated approaches such as Machine Leaning (ML) to assist in predicting the blood parameters that boost immunity against COVID-19 disease. This will help in developing self-therapies and strengthening the immune system against future attacks. Moreover, due to the different variants of coronavirus, courtiers respond differently to the COVID-19 patients [18].

Medical data mining by using ML tools has a great potential in extracting hidden patterns from huge number of datasets that can be utilized in clinical practices [24, 25]. During the pandemic, ML has been applied to various aspects of the disease including treatment, diagnosis, prognosis, and epidemiology [26, 27]. This study is following a design science paradigm [28] to a data -driven modeling to answer the following research question. How can big data and ML tools be leveraged to predict blood parameters of covid-19 patients for an effective mechanism for better healthcare facilities? Dataset was collected from the Ministry of Health in Oman. The 10-fold Cross Validation was applied to ensure the reliability of the results. This study proposes guidelines to stakeholders to manage and mitigate the impact of Covid-19 or similar disease on patients it was revealed that abnormality in Hemoglobin, Mean Cell Volume, and Eosinophil are the main risk factors for poor prognosis in older patients. This finding will help the decision makers in restructuring and prioritizing the treatment protocol.

2 Related work

The variability in the severity of COVID-19, a respiratory ailment with widespread global impact, is notable among different individuals. The identification of clinical markers that possess the ability to accurately anticipate the severity of a given ailment can greatly facilitate the implementation of timely and focused intervention strategies. Shang, Dong, et al. [29] conducted an analysis of clinical data from 443 individuals diagnosed with COVID-19, categorizing them into non-severe and severe groups. The researchers’ investigation identified the Neutrophil-to-lymphocyte ratio (NLR) and C-reactive protein (CRP) as two notable predictors of illness severity. These findings align with previous meta-analytical research conducted by Lagunas-Rangel, which emphasized that higher NLR values were indicative of more severe manifestations of COVID-19. The historical establishment of CRP as an inflammatory marker and its association with higher levels in individuals infected by the 2019-nCoV have demonstrated its importance in the context of acute lung injury. This assertion was additionally supported by prior studies that identified hypoalbuminemia, lymphopenia, and CRP levels surpassing 40 mg/L as potential indicators of the likelihood of respiratory failure in persons infected with MERS-CoV and suffering from pneumonia. Furthermore, the investigation delved into the significance of platelets in relation to the severity of COVID-19, revealing that a higher platelet count may serve as a potential safeguard against the development of severe symptoms associated with the illness. This is consistent with the research conducted by Georges, Brogly et al. [30] on the topic of thrombocytosis in patients diagnosed with severe community-acquired pneumonia, as published in the Chest journal. The user has provided a DOI link to a scholarly article titled "Chest" with the individual(s) who established a correlation between severe thrombocytopenia and heightened mortality in instances of severe community-acquired pneumonia. However, in light of the divergent perspectives outlined in Elmaraghy’s research concerning the correlation between platelets and mortality among individuals with pneumonia, further examination is necessary to fully comprehend the precise function of platelets in forecasting the severity of COVID-19.

The study conducted by Masana et al. [31] examined a total of 1411 COVID-19 patients who were admitted to the hospital. The objective of the study was to investigate the potential association between the patients’ plasma lipid profile and their clinical outcomes. The study findings demonstrated that people who experienced severe indications of COVID-19 exhibited a significant decrease in high-density lipoprotein (HDL) cholesterol levels and an increase in triglyceride levels, both prior to and during their infection. Significantly, the aforementioned lipid abnormalities have been recognized as robust indicators of a serious progression of the disease. In addition to these findings, it was shown that the lipid profile had a close association with ferritin and D-dimer concentrations, while appearing to be unrelated to CRP levels. The study emphasized the significance of dyslipidemia, particularly atherogenic dyslipidemia, in relation to adverse COVID-19 results. The aforementioned observations about the lipid profile of individuals indicate the possible usefulness of this profile as an indicator of inflammation, hence justifying the need to assess its value in cases with COVID-19. Moreover, the techniques utilized for this research highlight the significance of taking into account intricate non-linear associations among predictors. This is exemplified by the implementation of Random Forests models in conjunction with regularized logistic regression. Cumulatively, these findings underscore the significance of the lipid profile as more than just a basic biomarker, but rather as an integral element within the broader context of COVID-19’s interaction with the physiological mechanisms of the host.

An investigation, undertaken by Statsenko et al. [32], involved the evaluation of a cohort including 560 patients who were diagnosed with COVID-19 at the Dubai Mediclinic Parkview Hospital. The study was done during a period extending from February to May 2020. The researchers had difficulties in establishing cut-off values due to an imbalanced dataset, particularly due to the unequal distribution of ICU hospitalizations and non-severe cases. Nevertheless, by skillfully implementing a supervised machine learning algorithm, the thresholds were modified in order to improve the predicted accuracy of the model. Consequently, certain laboratory test thresholds were established and rationalized, including a lymphocyte count below 2.59 × 10^9/L and a C reactive protein level of 14.3 mg/L, among other criteria. It is noteworthy that during the analysis of neural network performance, a model that was trained exclusively on a subset of high-value tests, specifically aPTT, CRP, and fibrinogen, produced an area under the curve (AUC) of 0.86. This performance was found to be quite equivalent to that of a more comprehensive model trained using all available tests, which yielded an AUC of 0.90. These insights present encouraging opportunities for physicians, equipping them with distinct laboratory indicators to monitor, which may potentially inform interventions and decisions regarding patient care throughout the ongoing epidemic. In addition, the creation of an openly available digital platform derived from the research outcomes offers a pragmatic asset for healthcare practitioners on a global scale.

Aktar et al. [33] conducted a comprehensive study to explore the possibilities of employing peripheral blood data for predicting clinical outcomes in patients with COVID-19. The aim of the study was based on the recognition that the rapid analysis of blood samples might provide valuable information not only for confirming diagnoses, but also for predicting the progression of the disease. Utilizing a diverse variety of machine learning algorithms, such as the random forest, gradient boosting machine, and deep learning techniques, the researchers conducted an in-depth analysis of clinical records in order to identify potential correlations and discern unique patterns. The findings of their study revealed certain hematological indicators that have the potential to act as distinguishing variables between individuals who are uninfected with COVID-19 and those who have contracted the virus. Significantly, the researchers discovered certain subsets of data, such as lactate levels, immature granulocytes, hemoglobin, and others, which exhibited substantial predictive capability in relation to the severity of symptoms associated with COVID-19. The approaches employed in their study demonstrated highly encouraging accuracy ratings, frequently over 90%. These results indicate that the routine analysis of blood samples has the potential to aid in the timely detection of individuals at a heightened risk level. These technological breakthroughs provide significant potential, particularly in underdeveloped nations where there is a lack of critical care services. Despite certain limitations, such as the relatively small sample size and the absence of comprehensive clinical data, Aktar et al. [33] have presented a fundamental study that highlights the indispensable contribution of machine learning and clinical blood data in augmenting the response to COVID-19. This study underscores the necessity for additional international investigations in this field.

The potential of hematological indicators in assessing the severity and mortality risks among COVID-19 patients in Pakistan is demonstrated by Asghar et al. [34] retrospective investigation. The study conducted by the researchers involved a sample size of 191 individuals who tested positive for COVID-19 using the polymerase chain reaction (PCR) method. The findings of this study provided valuable insights, notably on the significance of mean hemoglobin levels, leukocyte count, Neutrophil-to-Lymphocyte ratio (NLR), and various other characteristics. A notable differential was observed between patients admitted to general wards and those in intensive care units (ICUs) in relation to these hematological indicators. The notable focus was on the significant variance shown in the Neutrophil-to-Lymphocyte Ratio (NLR) and Platelet-to-Lymphocyte Ratio (PLR) among the two distinct cohorts of patients. Furthermore, the differences observed between the survivors and those who did not survive were particularly noteworthy. The findings of their analysis indicated that elevated levels of NLR and PLR were observed in severe cases of COVID-19. Conversely, the Lymphocyte-to-Monocyte ratio (LMR) and Lymphocyte-to C-reactive protein ratio (LCR) demonstrated an inverse correlation with the severity of the disease. This study emphasizes that while inflammatory and hematological indicators offer valuable insights into the initial phases of the disease, their effectiveness in predicting overall mortality or therapeutic outcomes throughout inpatient treatment may be limited. The research highlights the necessity for further comprehensive follow-up studies to effectively utilize these indicators for patient care and prognosis, while underlining the importance of NLR, PLR, LMR, and LCR in relation to the severity of COVID-19 and mortality rates.

The study conducted by Seyit et al. [35] examines the predictive capabilities of specific hematological indicators, including C-reactive protein (CRP), white blood cell count (WBC), neutrophil-to-lymphocyte ratio (NLR), and platelet-to-lymphocyte ratio (PLR), among other variables. The study, which was conducted at Pamukkale University Hospital, had a total of 233 patients who were admitted within a two-month period from March to April 2020. The study yielded significant findings that are of great importance. Significantly, individuals who were diagnosed with Sars CoV-2 using polymerase chain reaction (PCR) testing exhibited noticeably increased levels of C-reactive protein (CRP), lactate dehydrogenase (LDH), platelet-to-lymphocyte ratio (PLR), and neutrophil-to-lymphocyte ratio (NLR). This observation implies a plausible association between the heightened biomarkers and the presence of COVID-19. In contrast, people who tested negative for the virus exhibited significantly elevated levels of eosinophil, lymphocyte, and platelet counts. The aforementioned results provide insight into the potential of utilizing these characteristics as additional diagnostic tools, enhancing the conventional real-time PCR evaluations. Nevertheless, it is important to acknowledge that the study conducted by Seyit et al. takes a retrospective approach and emphasizes the necessity for future research efforts that focus on prospective investigations. These initiatives are crucial in order to enhance and validate the effectiveness of these biomarkers, particularly in determining the most appropriate threshold values.

Yadav et al. [36] conducted a research to highlight some of the essential tasks that contributed to the spread of COVID-19. Examples of these task are: COVID-19 growth rate, how COVID-19 will end, transmission rate of the virus and the correlation between the weather condition and COVID-19. The used dataset is related to different countries which are Mainland, China, US, Italy, South Korea and India. The used dataset represents the total number of positive cases, recoveries, deaths in 93 days. The data of the temperature, wind speed and humidity were used to find the correlation between the spread of COVID-19 and the weather condition. Support Vector Regression method was used in this research and its results were compared with different regression models like Simple Linear Regression and Polynomial Regression. Although the mentioned accuracy rates considered high, the measurement methods for the accuracy rate were not mentioned.

Khalifa et al. [37] employed machine learning algorithms to classify coronavirus treatments type and level on a single human cell. The dataset was obtained from the RxRx.ai repository. It consists of more than 1660 types of approved drugs in a human cell and more than 300,000 listed experiments. DCNN model was used in this research and it consists of three layers which are three ReLU layers, three pooling layers, and two fully connected layers. The proposed model was compared with other machine learning algorithms like support vector machine, decision tree and ensemble. In term of treatment classification, DCNN model recorded the highest accuracy results, which is 98.05% in comparison with the other algorithms. In term of treatment level, classical machine learning (ensemble) recorded close score to the proposed DCNN model (98.5% and 98.2% respectively).

Wu et al. [38] employed a machine-learning algorithm to help in the speeding up diagnosis of COIVD19 patients. The used dataset contains 253 samples and 49 parameters. The parameters included, age, gender, tuberculosis, lung cancer and pneumonia. To predict the target class, the random forest algorithm was used. The 10-fold cross-validation method was used to validate the obtained classification results. To evaluate the classification accuracy, several measurements were used such as Matthews correlation coefficient (MCC), AUC, and total accuracy (ACC). The experimental results showed that the developed model is highly effective. It managed in shortening the process of the laboratory blood test by providing a speedy diagnosis for the infected patients. However, the major limitation of the study by Wu et al. [38] is that only one algorithm is employed in the experiments without providing an appropriate justification for not considering the other effective algorithms.

Based on the above information, it can be notice that most of the studies have not used the blood parameters in their experiments. Altan and Karasu [10] is limited to the pneumonia images Yadav et al. [36] has used weather dataset, Khalifa et al. [37] addressed the human cell. Wu et al. [38] has used few promising parameters; however, one single algorithm was employed in experiments. The use dataset by Khanday et al. [11] is limited to a clinical unstructured data. On top of this, due to the different variants of coronavirus, courtiers respond differently to the COVID-19 patients [18]. Thus, there is a need to consider a dataset from Oman to predict the parameters that cause the death in order to improve diagnosis and treatments of COVID-19 in Oman.

3 Data and methods

The clinical dataset was collected from the Royal Hospital in Oman with CBC test results. A total of 437 cases with the mean age of 33.54 ± 26.82 (range: 13–90) years were studied (79.6% female). All cases had positive RT-PCR for COVID-19 and were admitted to the hospital. Eighty-one percentages (354 cases) recovered while 19% (83 cases) died. The dataset includes various parameters such as age, gender, complete blood count, comorbidities, and blood type. The primary outcome for this study was death as a result of COVID-19 disease. Initially, the dataset has some missing values in some of the parameters. Removing cases which have missing values will result in having imbalanced small dataset. Thus, all parameters which have missing values were removed.

Table 1 summarized the blood parameters of the patients with different disease outcomes. CBC analysis showed that most of COVID-19 patients had changes in most of blood parameters. Although most of the blood parameters were within the range, few parameters had means that were out of the normal range. These included hemoglobin (8.71 g/dL ± 1.92), RBC counts (3.5 × 10⁶ ± 0.79), mean cell hemoglobin (24.86 pg ± 2.36), red cell distribution width-CV (17.63% ± 2.19), neutrophils (6.64 K/μL ± 7.95), eosinophils (0.03 K/μL ± 0.08), platelet count (197.59K/μL ± 112.33) and Creatinine (243.35 mm/L ± 165.50). These results are consistent with previous studies that reported low thrombocyte and neutrophil counts in COVID-19 positive patients [39]. The cohort in this study is overwhelming composed of anemic patients. Previous studies have also found that COVID-19 patients, particularly female patients, had lower hemoglobin, which is further confirmed in this study [39]. Around 88.8% of the subjects in this study had hemoglobin lower than the normal range (11.5–15.5 g/dL) and most of these subjects are female (79.6%). Among 348 females, 324 (93.1%) of them have hemoglobin below the normal range, while 63 out of 89 male patients (70.8%) have hemoglobin below the normal range. Moreover, the majority of the patients (86.7%) had renal malfunction as indicated by the elevated serum creatinine level. Other few parameters were also abnormal, although the overall means of these parameters were within the normal ranges. These parameters included platelets count, neutrophil count and basophil count. These parameters had significant a number of subjects with abnormal counts either below or above the normal range.

Table 1 Dataset of the study

Full size table

This research adopts the Knowledge Data Discovery (KDD) methodology [40, 41] for carrying out the artificial intelligence and machine learning projects. KDD it is widely used in the literature by many researches due to its effectiveness [42, 43]. It normally ensures that the obtain results are of high quality, scientifically valid, and potentially useful [43]. Based on the type of the knowledge which can be discovered in dataset, KDD techniques can be broadly classified into several categories, including clustering, classification, association, estimation, etc.

Following a typical KDD process roadmap, where data mining is the core in the overall processes, the experiments will go through the following steps in the specified orders, as show in Fig. 1: problem specification, resourcing, data cleansing, pre-processing, data mining, evaluation of the results, interpretation of the results, and exploitation of the results.

To identify the most effective parameters of the dataset, two feature selection algorithms were employed: InfoGain and Correlation. The Information Gain algorithm calculates the information gain for the target class by measuring the value of a feature. The result of this algorithm is presented Fig. 2A. The Correlation algorithm, is commonly used in the literature by many researchers. It calculates the correlation between each feature and the target class to determine the productivity of the feature. The result of this algorithm is presented in Fig. 2B. As shown that both algorithms ranked similar parameters as significant predictive factors in determining the final outcome and disease progression. Age was the most important predictive factor followed by other factors that included gender, hemoglobin, hematocrit, mean cell volume, and eosinophil count (Fig. 2A and B).

One of the most important issues in predictive analysis is measure and evaluate the classification quality, usually in terms of accuracy. Many measurements are used to evaluate the effectiveness of the model including: Accuracy, Precision, Recall, F-Measure, MCC, ROC Area, PRC Area.

Accuracy: is a metric that is widely used in the context of classification. In practice, two measurements are commonly used for estimating classification accuracy or error. The accuracy r can be measured by $r=\frac{1}{n}\sum_{i=1}^{C}{a}_{i}$ [44,45,46,47,48,49,50] where a_i is the number of majority cases with the same label in class i, C is the number of classes, and n the total number of cases in the dataset. Hence, the classification error can be obtained by e = 1–r. The smaller the value of e is, the better the results are. The same logic is employed in [51] but the function presented differently:

$$E_{c} = \frac{{\mathop \sum \nolimits_{i = 1}^{k} \left( {s_{i} - M_{i} } \right)}}{{\mathop \sum \nolimits_{i = 1}^{k} s_{i} }} = \frac{{\mathop \sum \nolimits_{i = 1}^{k} \left( {s_{i} - M_{i} } \right)}}{n}$$

where S_i is the size of class i and M_i is the number of majority cases with the same label in class i.

Precision: precision calculates the total relevant classification results (correct death and correct recovered cases) by the algorithm. It is defined as follow:

$$Precision = \frac{True\,Positive}{{True\,Positive + False\,Positive}}$$

Recall: recall is measurement technique that is used to assess the effectiveness of the used algorithm by measuring the proportion of the actual positives and correctly true. Mathematically it is calculated as:

$$Recall = \frac{True\,Positive}{{True\,Positive + False\,Negative}}$$

F-Measure is the average of Precision and Recall. It is calculated as:

$$F - Measure = \frac{2*Precision*Recall }{{Precision + Recall}}$$

ROC Curve: it stands for Receiver Operating Characteristics. It is used to illustrate how a classifier isolates true and false classes in order to identify the most optimal threshold for separating them. It is generated by plotting TP Rate vs FP Rate for different threshold values.

4 Results and discussions

InfoGain and Correlation algorithms highly ranked the first six parameters (Age, Blood Group, Gender, Mean Cell Hb, Mean Cell Volume, Hemoglobin, and Haematocrit of Blood) as a significant predictive factor for the final outcome (target class). Several machine learning algorithms were then developed based on the most predictive algorithm.

To conduct the experiments, the following classification algorithms have been used: J48, Naïve Base, IBK, Random Forest, Random Tree, and REPTree. In these experiments, the 10-fold Cross Validation is used to validate to obtained classification results. A comparison of the accuracy is illustrated in Fig. 3. In terms of the accuracy, according to the obtained results, the RandomForest algorithm scored the first place at an accuracy of 99.1%. The second best accuracy was achieved by J48 with an accuracy of 97.71%. The IBK and Radom Tree and the algorithms scored the third best at an accuracy around 97%. The lowest accuracy was achieved by NaïveBayes in which the accuracy was 90.0%.

By considering the other effectiveness measurements to validate the obtained results, it can be realized from Fig. 4 that RandomForest algorithm is the most optimal algorithm at a value of 99.1%. The J48 algorithm scored the second best highest value for precision and recall at a value of 97.7%. The IBK and RandomTree algorithms scored the third and fourth best, respectively. The lowest value for the algorithm validation was achieved by the NaïveBayes algorithm. Overall, the obtained results of the selected algorithms are valid. However, the RandomForest algorithm is the most effective one. This indicates that when the model predicts the death cases, it is correct 97.7% of the times. As for the recall, it confirmed that the model correctly identifies 97.7% of all the death and recovered cases. Again, comparing between the selected algorithms, F-measure confirmed that the RandomForest algorithm is the most optimal one with a value of 99%, see Table 2. The ROC Curve is one of the most important techniques for checking the effectiveness of the model as well the algorithm. Concerning this measure, as shown that the RandomForest scored the highest result, which confirmed the validity of the most optimal algorithms.

Table 2 Detailed accuracy of the selected algorithms

Full size table

As shown in Fig. 5A, patients with less than 59 years old were more likely to improve and recover from the COVID-19 infection, while in patients older than 59 years, the strongest prediction factor of death was increased mean cell volume, particularly if they were older than 83 years. For patients between the age 59 and 87, low Eosinophils was the strongest predictive factor of death. Another decision three shown in Fig. 5B found another set of important predictive factors, which also included age, mean cell hemoglobin, monocyte count, mean cell volume and hematocrit. In this analysis, it is found that patients who were less than 79 years old but have high mean cell hemoglobin were more likely to die. On the other hand, patients with low mean cell hemoglobin were more likely to die if they have lower mean cell volume. Older patients with low monocyte count were also of risk to die.

It was noticed that there is relationship between deaths related specific blood groups and the death, as illustrated in Fig. 6. Women with blood type O-positive had good prognosis and improved, while almost 40% of male patients with blood type O-positive died. Although the cohort was largely women and blood type O-positive is common in Oman [52], these results still represent a clear disparity in disease outcome between male and female based on the blood type. As for other blood groups, A-positive blood type was the second important predictor after O-positive blood type. In both male and female patients, A-positive blood type was slightly associated with bad prognosis when accompanied with abnormal mean cell volume. In this regard, males also were more affected.

Generally, the obtained results confirmed previous studies that report the association of severe COVID-19 disease and death with abnormalities in blood parameters such as hemoglobin, hematocrit, WBC counts, electrolytes as well as with some comorbidities such as chronic kidney disease [53]. The study revealed that patients who were less than 79 years old and high mean cell hemoglobin were more likely to die [54]. In contrast, if the low mean cell hemoglobin is associated with lower mean cell volume, patient will have high risk of death. Moreover, older patients with low monocyte count were also at risk of severe disease. It is highlighted that infected patients who were less than 59 years are more likely to overcome and survive from the COVID-19 disease.

Table 3 shows detailed death information by Blood Group parameter. Around 356 (78.9%) of the patients had blood type O+ , whereas 43 (9.5%) had blood type A + , 46 (10.2.2%) had blood B + , and only 6 (1.3%) had blood type B−. Among the 356 patients who were admitted to the hospital and had blood type O + , 314 (88.2%) of them have improved and only 9% of them had died. The majority of the patients who had died were of type A + and B + with percentage of 51.2% and 63%, respectively.

Table 3 Detailed information of the blood group parameter

Full size table

Analysis of blood types revealed that 78.9% of patients had blood type O-positive, whereas 9.5% had blood type A-positive, 10.2% had blood B-positive, and only 1.3% had blood type B-negative, see Table 3. The statistical analysis shows that among the 356 patients who were admitted to the hospital and had blood type O-positive, 314 (88.2%) of them have improved and only 9% of them had died. Most of the patients who had died were of type A-positive and B-positive with percentage of 51.2% and 63%, respectively. These results are consistent with multiple reports that showed an increased risk of severe COVID-19 disease among non-O types [55].

Most of the subjects in this study were anemic. Anemia has multiple causes and can be associated with many other conditions such as kidney diseases and hypothyroidism. It has been shown in various studies that low level anemia is associated with worsening pneumonia in patients with COVID-19 [54]. Moreover, most of subjects suffered from microcytic anemia, as evidence by low mean cell volume. Microcystic anemia can be caused by iron deficiency, chronic inflammation, and thalassemia. On the other hand, monocytopenia is commonly associated with acute infections. In SARS-CoV-2 infection, monocytopenia is associated with mild and severe disease, particularly in patients with chronic conditions [56]. Eosinopenia is also common in many respiratory diseases including COVID-19. Several studies demonstrated that eosinopenia is associated with severe COIVD-19 diseases and therefore can be reliably used as a prognostic marker [57, 58]. Although it is not clear how infection with SARS-Cov-2 causes eosinopenia, but it is thought that eosinopenia can be by multiple triggers of acute inflammation, which is very relevant in SARS-Cov-2 infection [57, 58].

Several studies have reported a relationship between blood type and COVID-19 disease outcome in which non-O patients are more prone for severe disease [59, 60]. Most of the subjects in this study were O-positive female. The experimental results found that COVID-19 disease outcome in O-positive patient is largely dependent on the gender. Male patients with blood type O-positive were more likely to die as compare to their female counterparts. The same observation has been noticed in patients with blood type A-positive. It unclear, however, how specific blood types contribute to the severity of a disease in male patients. On the other hand, depending on the abnormality of the mean cell volume, the blood type A-positive for both genders was slightly associated with bad prognosis.

To sum-up, this study revealed that, the Age, Hemoglobin, Mean Cell Volume, and Eosinophil are the most significant factors in predicting the progression of the disease and the final outcome. Haematological manifestations such as low lymphocyte and eosinophil numbers have prognostic significance and it is highly prevalent in COVID-19 patients. The reasons behind the drop in these parameters are summarized as follow:

Low eosinophil count (eosinopenia) is very common in many respiratory diseases including COVID-19. Several studies demonstrated that eosinopenia is associated with severe COVID-19 diseases and therefore can be reliably used as a prognostic marker. Although it is not clear how infection with SARS-Cov-2 causes eosinopenia, but it is thought that eosinopenia can be by multiple triggers of acute inflammation, which is very relevant in SARS-Cov-2 infection [57, 58].
Low hemoglobin level mean anemia. Anemia has multiple causes and can be associated with many other conditions such as kidney diseases and hypothyroidism. It has been shown in various studies that low level anemia is associated with worsening pneumonia in patients with COVID-19 [54].
Mean cell volume is the basically the measurement of the average size of the red blood cells. Most of the subjects in this study had MCV less than 78 fL, which is characteristic of microcytic anemia. Microcystic anemia can be caused by iron deficiency, chronic inflammation, and thalassemia.
Low monocyte count (monocytopenia) is commonly associated with acute infections. In SARS-CoV-2 infection, monocytopenia is associated with mild and severe disease, particularly in patients with chronic conditions [56].

5 Conclusion

This study employed the KDD methodology to experiment the use of machine learning to predict the outcome of Coronavirus (COVID-19/SARS-CoV-2) patients. The results of the experiments give two outcomes; one is confirming the previously published reports and papers, and the other is revealing some interesting patterns of the disease progression. Abnormalities in various blood parameters are associated with death from COVID-19. Low thrombocyte, neutrophil count and hemoglobin are all reported by other researchers to be associated with bad COVID-19 prognosis. The analysis in this study has confirmed that patients are more likely to die if they have abnormalities related to kidney functions such as Blood Sodium Level, Blood Chloride Level and Serum Creatinine. The study revealed that patients who were less than 79 years old but have high mean cell hemoglobin were more likely to die [54]. On the other hand, patients with low mean cell hemoglobin were more likely to die if they have lower mean cell volume (fL). Moreover, older patients with low monocyte count were also of risk to die. There is no much worries for the patients who are less than 59 years old. This group of patients are more likely to overcome and survive from the COVID-19 disease.

It is worth to mention that there is a relationship between blood type and worsening of the disease. Such an association is largely dependent on the gender. Male patients with blood type O + and A + are more likely to die as compare to their female counterparts. Depending on the abnormality of the mean cell volume, the A + blood type for both genders was slightly associated with bad prognosis. Like most of the other studies, it has been confirmed that hemoglobin is one of leading death factor [54]. The study highlighted a questionable finding which is that a quite large number of patients (96.5%) had died, although their Blood Sodium, which was determined as the cause of death, was within the normal range (136- 145 mmol/L).

The robust findings of this research lack other important parameters about the chronic diseases such as pneumonia, COPD, chronic renal disease, and diabetes. This is because different laboratory tests were requested for different patients depending on the patient’s health condition. Providing such parameters and others would have assist in identifying the relationship between blood parameters, chronic diseases and disease progression with the patient status. Moreover, it would assist in reshaping the COVID-19 treatment protocol. Future work should consider obtaining the comprehensive parameters and then conduct intensive experiments and validation with big data from Oman and other GCC countries. It should also consider employing artificial intelligence algorithm to optimize the overall classification accuracy.

6 Implication

The contribution of this study is classified into two types: theoretical and practical contribution. Theoretically, many essential practical contributions were proposed. The research findings should help the decision makers in the health institutions to quantify the risks, health benefits and cost-effectiveness of COVID-19. The obtained results can also help decision makers to take necessary procedures to safe patients’ life. The treatment protocol should be restructured to minimize the death risk. Thus, whenever an infected in-patients approach the hospital, first and foremost, the Mean Cell hemoglobin pictograms (Mean Cell Hb pg) needs to be checked particularly if the in-patients are older than 59 years old. According to the Ministry of Health in Oman, the normal range for the Mean Cell Hb pg is between 26 and 33 pg. Thus, a special attention needs to be given to patients who are between the age of 59 and 79 years old and their Mean Cell Hb is below 26 pg. This is because such a value is considered as abnormal-low and it indicates that less amount of hemoglobin present per red blood cell. Symptoms includes but not limited to shortness of breath, body tiredness, chest pain and a fast heart rate. Ultimately, it might cause death if the Mean Cell Hb pg is not set back to normal. In case the Mean Cell Hb pg drops below 24.1 pg and the Mean Cell Volume femtoliters (fl) fall below 78 fl, there is high possibility that patients might lose their live if there is no rapid medical intervention to ensure that Mean Cell Volume femtoliters is within the normal rage (78–96). Concerning the in-patients who are older than 79 years old, the initial medical checkup should spot the light on the Monocytes. This is because the Monocytes is critical component of the innate immune system and it is actively contributing in the processes of inflammatory and anti-inflammatory during an immune response. Consequently, if it drops below 0.6 in its value, this might weakness the immune system in tackling strange elements that invade the human body. There is also a possibility of death if the Monocytes is higher than 0.6 but the Haematocrit of Blood is not within the normal range (35- 45). The blood group is one of the criteria that must be tested as well to speed up the treatment protocol and minimize the death risk. It has been revealed that there is an association between death and specific blood groups; therefore, the necessary measures must be taken to reduce the risk. Women with blood group A or B and Mean Cell Hb less than 26 ph are at high risk of death compared to the other blood Besides, men with blood group O or A or B are at high risk of death. Accordingly, high risk blood group must be given priority in treatment. Following such a treatment protocol will contribute positively in reshaping the COVID-19 management protocol by improving the strategic decision of the treatment system. Practically, the developed reusable model can be easily employ to predict the status of future patients: high risk (death) and low risk (recovered). This will assist in speeding up the diagnosis and the treatment.

Data availability

The clinical dataset used in this study was collected from the Royal Hospital in Oman with CBC test results. This dataset is available from the corresponding author on reasonable request.

References

Chatterjee S (2021) COVID-19: tackling global pandemics through scientific and social tools. Academic Press, Cambridge
Google Scholar
Nanda A, Tuteja S, Gupta S (2022) Machine learning based analysis and prediction of college students’ mental health during COVID-19 in India. Elsevier, Amsterdam
Book Google Scholar
Dashboard W C (2023) WHO Coronavirus (COVID-19) Dashboard.
Ryalat MH, Dorgham O, Tedmori S, Al-Rahamneh Z, Al-Najdawi N, Mirjalili S (2023) Harris hawks optimization for COVID-19 diagnosis based on multi-threshold image segmentation. Neural Comput & Applic 35:6855–6873
Article Google Scholar
Tian W, Jiang W, Yao J, Nicholson CJ, Li RH, Sigurslid HH, Wooster L, Rotter JI, Guo X, Malhotra R (2020) Predictors of mortality in hospitalized COVID-19 patients: a systematic review and meta-analysis. J Med Virol 92:10
Article Google Scholar
Dilsizian SE, Siegel EL (2014) Artificial intelligence in medicine and cardiac imaging: harnessing big data and advanced computing to provide personalized medical diagnosis and treatment. Curr Cardiol Rep 16:441
Article Google Scholar
Obermeyer Z, Emanuel EJ (2016) Predicting the future - big data, machine learning, and clinical medicine. N Engl J Med 375:13
Article Google Scholar
Zhang B, Nilsson ME, Prigerson HG (2012) Factors important to patients’ quality of life at the end of life. Arch Intern Med. 172(15):1133–1142
Article Google Scholar
Rumaling MI, Chee FP, Bade A, Goh LPW, Juhim F (2023) Biofingerprint detection of corona virus using Raman spectroscopy: a novel approach. SN Appl Sci 5:197
Article Google Scholar
Altan A, Karasu S (2020) Recognition of COVID-19 disease from X-ray images by hybrid model consisting of 2D curvelet transform, chaotic salp swarm algorithm and deep learning technique. Chaos Solitons Fractals 140:110071
Article MathSciNet Google Scholar
Khanday AMUD, Rabani ST, Khan QR, Rouf N, Mohi UDM (2020) Machine learning based approaches for detecting COVID-19 using clinical text data. Int J Inf Technol 12:3
Google Scholar
Al Wahaibi A, Al Rawahi B, Patel PK, AlKhalili S, AlMaani A, Al-Abri S (2021) COVID-19 disease severity and mortality determinants: a large population-based analysis in Oman. Travel Med Infect Dis 39:101923
Article Google Scholar
CDC COVID-19 Response team (2020) Severe outcomes among patients with coronavirus disease 2019 (COVID-19) In: Proceedings of the MMWR Morb Mortal Wkly Rep (United States, 16 Feb 12-March 2020)
Chen N, Zhou M, Dong X, Qu J, Gong F, Han Y, Qiu Y, Wang J, Liu Y, Wei Y, Xia J, Yu T, Zhang X, Zhang L (2020) Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan. China Descriptive Study 15(395):507–513
Google Scholar
MOH (2020) Analysis COVID-19 gene sequencing
UK Health Security Agency SARS-CoV-2 variants of concern and variants under investigation in England. . Technical Briefing 35: UKHSA 2022
Union, A. a. o. t. E. European centre for disease prevention and control. 2023
Yan B, Zhang X, Wu L, Zhu H, Chen B (2020) Why do countries respond differently to COVID-19 A comparative study of Sweden, China, France, and Japan. Am Rev Public Adm 50(6–7):762–769
Article Google Scholar
NIH (2021) How COVID-19 variants evade immune response. National Institues of health research matters. NIH, Bethesda
Google Scholar
Pavia C, Gurtler V (2022) Covid-19: biomedical perspectives. Elsevier, Amsterdam
Google Scholar
WHO (2021) Coronavirus disease (COVID-19). WHO, Geneva
Google Scholar
Fang L, Karakiulakis G, Roth M (2020) Are patients with hypertension and diabetes mellitus at increased risk for COVID-19 infection? Lancet Respir Med 8(4):21
Article Google Scholar
Selvin E, Juraschek SP (2020) Diabetes epidemiology in the COVID-19 pandemic. Diabetes Care 43(8):1690–1694
Article Google Scholar
Long JB, Ehrenfeld JM (2020) The role of augmented intelligence (AI) in detecting and preventing the spread of novel coronavirus. J Med Syst 44:3
Article Google Scholar
Jain S, Pandey K, Jain P, Seng KP (2022) Artificial intelligence, machine learning, and mental health in pandemics: a computational approach. Elsevier, Amsterdam
Google Scholar
Tayarani N, Mohammad H (2021) Applications of artificial intelligence in battling against covid-19: a literature review. Chaos Solitons Fractals 142:110338
Article MathSciNet Google Scholar
Vaishya R, Javaid M, Khan IH, Haleem A (2020) Artificial intelligence (ai) applications for covid-19 pandemic. Diabetes Metab Syndr 14(4):337–339
Article Google Scholar
Hevner AR, March ST, Park J, Ram S (2004) Design science in information systems research. MIS Quart 28(1):75–105
Article Google Scholar
Shang W, Dong J, Ren Y, Tian M, Li W, Hu J, Li Y (2020) The value of clinical parameters in predicting the severity of COVID-19. J Med Virol 92(10):2188–2192
Article Google Scholar
Georges H, Brogly N, Olive D, Leroy O (2010) Thrombocytosis in patients with severe community-acquired pneumonia. Chest J 138:5
Article Google Scholar
Masana L, Correig E, Ibarretxe D, Anoro E, Arroyo JA, Jericó C, Guerrero C, Miret ML, Näf S, Pardo A, Perea V (2021) Low HDL and high triglycerides predict COVID-19 severity. Sci Rep 11(7217):2021
Google Scholar
Statsenko Y, Zahmi FA, Habuza T, Gorkom KN-V, Zaki N (2021) Prediction of COVID-19 severity using laboratory findings on admission: informative values, thresholds ML model performance. BMJ Open 11:2
Article Google Scholar
Aktar S, Ahamad MM, Rashed-Al-Mahfuz M, Azad AK, Uddin S, Kamal AH, Alyami SA, Lin PI, Islam SM, Quinn JM, Eapen V (2021) Machine learning approach to predicting COVID-19 disease severity based on clinical blood test data: statistical analysis and model development. JMIR Med Inform 9(4):e25884
Article Google Scholar
Asghar MS, Khan NA, Kazmi SJH, Ahmed A, Hassan M (2020) Hematological parameters predicting severity and mortality in COVID-19 patients of Pakistan: a retrospective comparative analysis. J Commun Hosp Internal Med Perspect 10:6
Article Google Scholar
Seyit M, Avci E, Nar R, Senol H, Yilmaz A, Ozen M, Oskay A, Aybek H (2021) Neutrophil to lymphocyte ratio, lymphocyte to monocyte ratio and platelet to lymphocyte ratio to predict the severity of COVID-19. Am J Emerg Med 40:110–114
Article Google Scholar
Yadav M, Perumal M, Srinivas M (2020) Analysis on novel coronavirus (COVID-19) using machine learning methods. Chaos Solitons Fractals 139:110050
Article MathSciNet Google Scholar
Khalifa NEM, Taha MHN, Manogaran G, Loey M (2020) A deep learning model and machine learning methods for the classification of potential coronavirus treatments on a single human cell. J Nanoparticle Res 22:11
Article Google Scholar
Wu J, Zhang P, Zhang L, Meng W, Li J, Tong C, Li Y, Cai J, Yang Z, Zhu J, Zhao M, Huang H, Xie X, Li S (2020) Rapid and accurate identification of COVID-19 infection through machine learning based on clinical available blood test results. medRxiv. https://doi.org/10.1101/2020.04.02.20051136
Article Google Scholar
Usul E, Şan İ, Bekgöz B, Şahin A (2020) Role of hematological parameters in COVID-19 patients in the emergency room. Biomarkers Med 14(13):1207–1215
Article Google Scholar
Debuse J, de la Iglesia B, Howard C, Rayward-Smith V (2001) Building the KDD roadmap. Springer, London
Book Google Scholar
Debuse JC, De la Iglesia B, Howard CM, Rayward-Smith VJ (2000) Building the KDD roadmap: a methodology for knowledge discovery. In: Roy R (ed) Industrial knowledge management. Springer-Verlag, London, pp 179–196
Google Scholar
Williams G.J. and Z., H. A case study in knowledge acquisition for insurance risk assessment using a KDD methodology. Dept. of AI, Univ. of NSW, 1996.
Rahman F.A., Desa M.I., A., W. and Haris N.A. Knowledge discovery database (KDD)-data mining application in transportation. Proceeding of the Electrical Engineering Computer Science and Informatics, 1, 1 (2014), 116–119.
Huang Z (1998) Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Min Knowl Disc 2(3):283–304
Article Google Scholar
Khan S. and Kant S. Computation of Initial Modes for K-modes Clustering Algorithm using Evidence Accumulation. Morgan Kaufmann Publishers Inc., 2007.
He Z, Deng S, Xu X (2006) Approximation Algorithms for K-Modes Clustering. Springer, Berlin / Heidelberg
Book Google Scholar
Z., H., Xu X., Deng S. and Deng S. K-Histograms: An Efficient Clustering Algorithm for Categorical Dataset. arXiv preprint cs/0509033 (2005).
He Z, Xu X, Deng S (2005) TCSOM: Clustering Transactions Using Self-Organizing Map. Neural Process Lett 22(3):249–262
Article Google Scholar
Aranganayagi S, Thangavel K (2009) Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure. Int J Comput Intell 5:2
Google Scholar
He Z, Xu X, Deng S (2005) Scalable Algorithms for Clustering Large Datasets with Mixed Type Attributes. Int J Intell Syst 20:10
Article MATH Google Scholar
Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Know Dis Data (TKDD). https://doi.org/10.1145/1217299.1217300
Article Google Scholar
Al-Riyami AZ, Al-Marhoobi A, Al-Hosni S, Mahrooqi SA, Schmidt M, O’Brien S, Al-Khabori M (2019) Prevalence of red blood cell major blood group antigens and phenotypes among omani blood donors. Oman Med J 34(6):496
Article Google Scholar
Sun L (2020) Combination of four clinical indicators predicts the severe/critical symptom of patients infected COVID-19. J Clin Virol. https://doi.org/10.1016/j.jcv.2020.104431
Article Google Scholar
Bergamaschi G, Borrelli de Andreis F, Aronico N (2021) Anemia in patients with Covid-19: pathogenesis and clinical significance. Clin Exp Med. https://doi.org/10.1007/s10238-021-00699-8
Article Google Scholar
Zietz M, Zucker J, Zucker J, Tatonetti NP (2020) and Tatonetti NP Associations between blood type and COVID-19 infection, intubation, and death. Nat Commun 11:5761
Article Google Scholar
Anurag A, Jha PK, Kumar A (2020) Differential white blood cell count in the COVID-19: a cross-sectional study of 148 patients. Diabetes Metab Syndr 14:2099–2102
Article Google Scholar
Lindsley AW, Schwartz JT, Rothenberg ME (2020) Eosinophil responses during COVID-19 infections and coronavirus vaccination. J Allergy Clin Immunol 146:1–7
Article Google Scholar
Soni M (2020) Evaluation of eosinopenia as a diagnostic and prognostic indicator in COVID-19 infection. Int J Lab Hematol. https://doi.org/10.1111/ijlh.13573
Article Google Scholar
Latz CA, DeCarlo C, Boitano L, Png CYM, Patell R, Conrad MF, Eagleton M, Dua A (2020) Blood type and outcomes in patients with COVID-19. Ann Hematol 99(9):2113–2118
Article Google Scholar
Sun L, Song F, Shi N, Liu F, Li S, Li P, Zhang W, Jiang X, Zhang Y, Sun L, Chen X, Shi Y (2020) Combination of four clinical indicators predicts the severe/critical symptom of patients infected COVID-19. J Clin Virol. https://doi.org/10.1016/j.jcv.2020.104431
Article Google Scholar

Download references

Funding

This publication was supported by The Research Coucil TRC/CRP/SQU/COVID-19/20/03 at the Ministry of Higher Education and Scientific Research. Its findings are solely the responsibility of the authors and do not necessarily represent the official views of the TRC.

Author information

Authors and Affiliations

Information Systems Department, College of Economics and Political Science, Sultan Qaboos University, Muscat, Oman
Jamil Al Shaqsi
Department of Premedicine, College of Medicine and Health Sciences, National University of Science and Technology, Sohar, Oman
Mohamed Borghan
Prince Abdullah Bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, Al-Salt, 19117, Jordan
Osama Drogham
School of Information Technology, Skyline University College, University City of Sharjah, P.O. Box 1797, Sharjah, United Arab Emirates
Osama Drogham
Falha Medical Solutions, Muscat, Oman
Salim Al Whahaibi

Authors

Jamil Al Shaqsi
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Borghan
View author publications
You can also search for this author in PubMed Google Scholar
Osama Drogham
View author publications
You can also search for this author in PubMed Google Scholar
Salim Al Whahaibi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JAS: visualization, methodology, experiments, analysis, interpretation and writing- original draft preparation.MB: visualization, investigation, analysis, interpretation and writing- original draft preparation. OD: analysis, validation, proof reading, reviewing and editing. SAW: data collection, data validation, domain expert in the analysis stage

Corresponding author

Correspondence to Jamil Al Shaqsi.

Ethics declarations

Competing interests

The authors have not disclosed any competing interests.

Ethical approval

This research was carried out following the guidelines of the ethics committee of Ministry of Health in Oman number: MoH/DGPS/GSR/PROPOSAL_APPROVED/41/2020.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Al Shaqsi, J., Borghan, M., Drogham, O. et al. A machine learning approach to predict the parameters of COVID‐19 severity to improve the diagnosis protocol in Oman. SN Appl. Sci. 5, 273 (2023). https://doi.org/10.1007/s42452-023-05495-5

Download citation

Received: 22 June 2023
Accepted: 14 September 2023
Published: 29 September 2023
DOI: https://doi.org/10.1007/s42452-023-05495-5

A machine learning approach to predict the parameters of COVID‐19 severity to improve the diagnosis protocol in Oman

Abstract

Article Highlights

Similar content being viewed by others

Acute kidney injury in the critically ill: an updated review on pathophysiology and management

How to use biomarkers of infection or sepsis at the bedside: guide to clinicians

C-reactive protein and procalcitonin during course of sepsis and septic shock

1 Introduction

2 Related work

3 Data and methods

4 Results and discussions

5 Conclusion

6 Implication

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A machine learning approach to predict the parameters of COVID‐19 severity to improve the diagnosis protocol in Oman

Abstract

Article Highlights

Similar content being viewed by others

Acute kidney injury in the critically ill: an updated review on pathophysiology and management

How to use biomarkers of infection or sepsis at the bedside: guide to clinicians

C-reactive protein and procalcitonin during course of sepsis and septic shock

1 Introduction

2 Related work

3 Data and methods

4 Results and discussions

5 Conclusion

6 Implication

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation