BIO-CXRNET: a robust multimodal stacking machine learning technique for mortality risk prediction of COVID-19 patients using chest X-ray images and clinical data

Nowadays, quick, and accurate diagnosis of COVID-19 is a pressing need. This study presents a multimodal system to meet this need. The presented system employs a machine learning module that learns the required knowledge from the datasets collected from 930 COVID-19 patients hospitalized in Italy during the first wave of COVID-19 (March–June 2020). The dataset consists of twenty-five biomarkers from electronic health record and Chest X-ray (CXR) images. It is found that the system can diagnose low- or high-risk patients with an accuracy, sensitivity, and F1-score of 89.03%, 90.44%, and 89.03%, respectively. The system exhibits 6% higher accuracy than the systems that employ either CXR images or biomarker data. In addition, the system can calculate the mortality risk of high-risk patients using multivariate logistic regression-based nomogram scoring technique. Interested physicians can use the presented system to predict the early mortality risks of COVID-19 patients using the web-link: Covid-severity-grading-AI. In this case, a physician needs to input the following information: CXR image file, Lactate Dehydrogenase (LDH), Oxygen Saturation (O2%), White Blood Cells Count, C-reactive protein, and Age. This way, this study contributes to the management of COVID-19 patients by predicting early mortality risk. Supplementary Information The online version contains supplementary material available at 10.1007/s00521-023-08606-w.


I. Introduction
As of May 6, 2022, over 6.2 million individuals have died and over 516 million people infected due to the COVID-19 pandemic [1].The global corporate, economic, and social dynamics were all affected.Flight limitations, social isolation, and increased hygiene awareness have been implemented by governments all over the globe.COVID-19 is sometimes can be confused with other viral infections [2,3], making the identification difficult.Reverse-transcription polymerase chain reaction (RT-PCR) arrays is the approved primary diagnostic approach for COVID-19 detection [4,5].Its detection performance can suffer due to sample contamination/damage or viral alterations in the COVID-19 genome [6,7].As a result, some studies [8,9] have suggested that Chest Computed Tomography (CT) imaging can be used as an alternative approach.Besides, for a RT-PCR negative patient with COVID-19 symptoms, several researchers have recommended to use CT scan as a follow-up test [8][9][10].CT scans, despite their superior performance, have several disadvantages and limitations.For early COVID-19 instances, their sensitivity is limited, image collection is slow and expensive.Chest X-ray (CXR) imaging, in contrast to CT, is a less expensive, faster, and more widely available technique that exposes the body to less hazardous radiation.[11].Chest X-rays are frequently used as an alternative COVID-19 screening technique, and it has been demonstrated to have a high predictive value [12].On radiological images, early COVID-19 cases showed bilateral, multifocal ground-glass opacities (GGO) with posterior or peripheral distribution, primarily in the lower lung lobes, and eventually progressed into pulmonary consolidation [13,14].The lung abnormalities share many common characteristics.As a result, doctors have a hard time distinguishing between COVID-19 infection and other types of viral pneumonia.As a result, in the current scenario, this symptom likeness could result in a misdiagnosis, delayed treatment, or can even result in death.
In recent years, great advances in Deep Learning approaches have resulted in state-of-the-art performance in a variety of Computer Vision applications, including picture classification, object recognition, and image segmentation.As a result of this achievement, deep learning-based solutions have become more widely used in a variety of disciplines.With the advent of deep Convolutional Neural Networks (CNNs), their use on CXR images is widely researched and accepted.Rajpurkar et al. [15] proposed the CheXNet network, by modifying Densenet121 on one of the largest Chest X-ray datasets [16], containing 100 thousand X-ray images for 14 different diseases.Similarly, Rahman et al. [17] trained CXRs for detecting pulmonary tuberculosis (TB) using a dataset of 3,500 infected and 3,500 normal CXRs.They have also re-trained the DenseNet201 network on TB and normal datasets and obtained state-of-the-art performance in TB detection with a sensitivity of 98.57 %.However, until recently, lung segmentation is used as the first step in their detection technique [18,19], which helps in localizing the decision-making area for the machine learning networks.They have used the popular dataset from the Montgomery [20] and Shenzhen [21] CXR lung mask datasets, which together produce 704 X-ray images for Normal and TB patients.However, sometimes due to severe deformity of the lungs in extreme COVID-19 cases or low resolution pictures, the segmentation performance suffers.Khuzani et al [22] proposed that a set of features of CXR pictures may be constructed using the dimensionality reduction method to build an effective machine learning classifier that can identify COVID-19 instances from non-COVID-19 cases with high accuracy and sensitivity.Mathew et al. [23] proposed a Siamese neural network-based severity score to automatically measures radiographic COVID-19 pulmonary disease severity which was verified with pulmonary x-ray severity (PXS) scores from two thoracic radiologists and one in-training radiologist.Kim et al. [24] proposed a completely automated triage pipeline that analyses chest radiographs for the presence, severity, and progression of COVID-19 pneumonia and produced an accuracy of 79.9%.Maguolo and Nanni in [25] questioned the performance of COVID -19 detection from X-rays in various literature and mentioned that it should include larger and diverse X-rays to avoid biases.Robert et al. [26] have reasoned in the same line of thought by doing an extensive literature review and suggesting the use of a diverse and large dataset for the proposal of COVID-19 detection from chest X-Rays.The authors of this paper were one of the pioneers that have proposed state-of-the-art deep learning model to detect pneumonia [27] and COVID-19 [28] from Chest X-rays.They have further improved their work in [29], where they created the largest benchmark dataset with 33,920 CXR images, including 11,956 COVID-19 samples using an effective human-machine collaborative strategy to annotate ground-truth lung segmentation masks.This is the largest CXR lung segmentation dataset to the best of the authors' knowledge, which can help in CXR-related computer-aided-diagnostic tools development using deep learning methods.In this study, the authors have used the model trained on that state-of-the-art dataset to segment the lung areas from the CXR images.We investigated the impact of image enhancement techniques on segmented lungs for COVID-19 prediction in a previous study [30], confirming that gamma correction enhancement provided an F1-score of around 90% using a dataset of total 18,479 Chest X-ray images (8851 normal, 6012 non-COVID other lung diseases, and 3616 COVID-19) and their ground truth lung masks.
According to recent research, biomarkers can play a dominating role in giving critical information about an individual's health and in identifying COVID-19.Sarah et al. [31] presented the Kuwait Progression Indicator (KPI) score as a predictive tool for estimating the severity of COVID-19 progression.In contrast to scoring systems that rely on self-reported symptoms and other subjective characteristics, the KPI model was founded on laboratory variables, which are objectively measurable metrics.Patients were classified as low risk if their KPI score goes below -7 and as high risk if it increases beyond 16, however, the risk of advancement was unknown for those with a score between -6 and 15.This restricts its applicability to a large variety of patient groups.Weng et al. [32] published an early prediction score named ANDC to predict mortality risk in COVID patients.This prediction model was developed using data from 301 adult patients with laboratoryconfirmed COVID-19.LASSO regression indicated age, neutrophil-to-lymphocyte ratio, D-dimer, and Creactive protein acquired on hospital admission as strong predictors of death for COVID-19 patients.The area under the curve (AUC) for the derivation and validation cohorts was 0.921 and 0.975, respectively, indicating that the nomogram was well calibrated and discriminated.COVID patients were categorized into three groups based on ANDC cut-off values of 59 and 101.The low-risk group (ANDC 59) had a mortality probability of less than 5%, the moderate risk group (59<ANDC<101) had a death probability of 5% to 50%, and the highrisk group (ANDC>101) had a death probability of more than 50%.Xie et al. [33] developed a predictive model that combines age, lactate dehydrogenase (LDH), lymphocyte count, and SpO2 as independent predictors of death using a dataset of 444 patients, where it showed good performance for both internal (c=0.89) and external (c=0.98)validations.However, the model showed over-prediction in low-risk persons while under-prediction for the patients with high-risk.
Intensive care units (ICUs) are crucial to save severe COVID-19 patients by providing oxygen, 24hour monitoring, care, and assisted ventilation when necessary.As a result, in areas where COVID-19 infection rate is high, ICU beds are a valuable resource [34][35][36].Basic blood tests and vital signs measurements are among the routinely gathered healthcare data, which are often available within the first hour of visit to the hospitals.These data provide the patterns of changes in COVID -19 patients as described in several retrospective observational studies [37][38][39].These studies have concluded that variables including alanine aminotransferase (ALT), lymphocyte count, D-dimer, C-reactive protein (CRP), and bilirubin concentrations as important clinical indicators.Therefore, clinical biomarkers can be used for developing very good prognostic model using classical and deep learning models.
Although Convolutional Neural Networks (CNNs) can be trained to stratify different diseases using radiographic or other images, imaging data alone cannot consistently identify the underlying medical cause.A combination of patient symptoms, physical exam findings, laboratory data, and radiologic imaging findings can be used to determine the underlying etiology and severity (if available).As a result, machine learning algorithms that combine information from Chest X-rays with additional clinical data from the electronic health record (EHR) will be able to better predict the severity of the patient.However, attempts to combine EHR data and imaging data for machine learning applications in healthcare have not been well-studied.Few studies have used both radiographic images and clinical biomarkers data to predict the prognostication of COVID-19 patients using artificial intelligence.Jiao et al. [40] developed a machine learning model using clinical data and CXR images to envisage COVID-19 severity and progression and reported an AUC of 82 %.Chieregato et al. [41] proposed a multimodal approach based on CT images and clinical parameters, which were supplied to Boruta feature selection algorithm with SHAP (SHapley Additive exPlanations) values, and then CatBoost gradient boosting classifier showed an AUC of 0.949 on the holdout test set for reduced features.With a probability score based on SHAP feature importance, the model intends to give clinical decision support to medical physicians.However, the reported studies either produced poor performance or have used small datasets, which limits the generalizability of the models, or have used CT modality, which has limitations.
The above pitfalls have encouraged the authors of this work to develop a multimodal system using CXR and clinical biomarker based system to stratify the severity of COVID-19 patient and their risk of death.This is one of the first works, which used both CXR and biomarkers to develop a COVID-19 severity prediction model.Thus, the work tackles the above limitation with the below contributions:  We created a machine learning model that used chest radiographs and clinical data to identify severe infection and death risks in COVID-19 patients. Combining multimodal data (image and clinical data), it was possible to boost the performance of the prognostic model significantly compared to the model using individual data. A nomogram-based scoring system to combine image features and important biomarkers along with a novel stacking machine learning technique is proposed. A web application is developed using the proposed multimodal approach, which will enable the clinicians to utilize such a tool to aid in the diagnostic process.
The rest of the article has the following structure: Section II describes the methodology of the study, which includes dataset description, preprocessing stages, machine learning and stacking technique, and nomogrambased scoring system development.Section III presents the findings of the experiments and reports the performance of the scoring technique, while Section IV explains the findings.Finally, Section V concludes the article with future recommendations.

II. Methodology
Two major investigations were carried out in this study.In the first investigation, a multimodal approach using CXR image and clinical data was used to predict the severity risk of COVID-19 patients.Firstly, CXR images are preprocessed and the lung area is segmented and then fed to a pre-trained deep CNN model to extract image features, and then principal component analysis (PCA) was used to reduce the dimensionality of the extracted image features.In parallel, clinical data was processed and the clinical features were ranked using a feature selection algorithm.Finally, the PCA components and top-ranked clinical features were combined to develop a stacking ensemble model to predict the low or high-risk patients.In the second investigation, we analyzed high-risk patients' data to predict the death outcome using the stacking model from CXR images and clinical biomarkers.Moreover, we established a scoring technique using nomogram for the early prediction of death outcomes.Figure 1 illustrates the schematic overview of the methodology.

Dataset Description
The study utilized a dataset from the first wave of COVID-19, between March and June 2020, that included both CXRs and clinical data obtained from six Italian hospitals at the time of admission for symptomatic COVID-19 patients [42].This dataset contains an anteroposterior (AP) or posteroanterior (PA) view of 930 X-ray images and clinical data of COVID-19 positive patients [42].

Statistical Characteristics
Stata/MP 13.0 software was used to conduct a statistical analysis of the patient's demographic, signs and symptoms, clinical data, comorbidity, and outcome.Gender, age, and twenty-three signs and symptoms, comorbidity, and clinical biomarkers are available in the dataset.Table 1 lists the statistical features of 25 parameters (age, gender, sign and symptoms, comorbidity, clinical biomarkers).Gender is represented in numbers and percentages.The number of missing data (N), presence and absence of signs and symptoms, mean (M), and standard deviation (SD) were reported for the remaining variables.Univariate analysis (Chisquare test) was done for gender, while the rest of the variables were subjected to Wilcoxon's ranked tests.The p-value was considered significant if it is less than 0.05 using a 95% statistical significance criterion.

Data Preprocessing
This section discusses the data preprocessing steps for both of the data modalities in detail.

Chest X-ray Image Preprocessing A. Gamma Correction
Image enhancement is a common picture-processing technique that emphasizes significant information in an image while reducing or removing other information to increase the quality of identification.Gamma correction was applied to CXRs as shown in our previous work [30] that it enhances COVID detection performance by improving image quality.Linear operations, such as pixel-wise scalar multiplication, addition, and subtraction are often performed for image normalization, whereas Gamma correction technique is a nonlinear operation performed on the pixels of the source image.Gamma correction uses a projection link with gamma value and pixel value as per the internal map.Here pixel value can vary between 0 to 255.If G is the grey scale value, then the output pixel after gamma correction s(G) can be written as: where (G) represents gamma value.

B. Lung Segmentation
As discussed earlier, it is very important to localize the region of interest for the machine learning networks, i.e. the lungs in the Chest X-rays in this case.Feature Pyramid Networks (FPN) [43] segmentation network with DenseNet121 [44] encoder as a backbone outperformed other conventional segmentation networks in our previous work for CXR lung segmentation [29].In [29], a detailed investigation was done on three segmentation architectures, U-Net [45], U-Net++ [46], and Feature Pyramid Networks (FPN) [43] with different encoder backbones.Using the FPN network with DenseNet121 backbone, it segmented the lung area very accurately which was verified by the experienced radiologists as well.Some of the X-ray images and their corresponding masks are illustrated in Figure 3.

C. Feature Extraction
A ChexNet CNN model which is based on DenseNet-121 [44] architecture was used to extract important features from the segmented Chest X-rays.It should be worth mentioning here that CheXNet is a variant of DenseNet which was trained on a large Chest X-ray dataset and the pretrained model is available publicly.It performed exceptionally well-COVID-19 classification task as shown in our previous work [30].To extract useful features from the segmented lung area of the CXR images, features from the last layer ('AvgPool') before the Softmax layer of the CheXNet model were extracted.

D. PCA for Dimensionality Reduction
A feature reduction technique called Principal Component Analysis (PCA) was applied to reduce the dimensionality of the feature space produced from the ChexNet model.It projects high-dimensional data into a new lower-dimensional representation with as minimal reconstruction error as possible.Because all the fundamental components in the reduced set are orthogonal to one another, there is no redundant data.PCA was calculated with whitening, which can improve accuracy by forcing data to meet certain assumptions.

Clinical Data Preprocessing A. Data Imputation and Normalization
The most critical phase in clinical data preprocessing for machine learning model construction is missing data imputation.Many blood biomarkers were obtained for each patient, and many of them were absent from certain patients.Instead of eliminating the missing data for the various variables, different imputation techniques were investigated.Deleting the missing variable can result in the loss of critical and contextual information, as well as affecting the dataset's generalized representation [20].Machine learning (ML)-based data imputation methods have become more popular for missing value imputation.On the other hand, this technique necessitates the creation of a distinct model for each missing data column.A popular data imputation technique, called multivariate imputation by chained equations (MICE) was used in this study for dealing with missing data.It is reported in literature and authors' previous works [47][48][49][50] that the MICE technique outperforms other imputation techniques for clinical data [51].
The effectiveness of machine learning models for generalized performance is strongly reliant on the quality of the input data.The term "data normalization" refers to the process of scaling or changing data so that each feature contributes equally to the training process.Normalization has been demonstrated to increase the performance of machine learning models in numerous studies [29].In this investigation, Z-score normalization was used by subtracting the average of data and dividing it by standard deviation.

B. Top-Ranked Features
The feature selection technique chooses the features that have the greatest impact in predicting the output.It aids in the reduction of overfitting, typically improves accuracy, and greatly decreases training time.Univariate selection, principal component analysis (PCA), recursive feature elimination (RFE), bagged decision trees (e.g., random forest) and boosted trees (e.g., Extreme Gradient Boosting) are some of the feature selection methods used in the literature.Because of its capacity to handle datasets with many predictor variables, random forest frequently gives superior performance [52].As a result, out of 25 variables, including age, gender, sign and symptoms, comorbidity, and clinical biomarkers, a random forest-based feature selection technique was employed in this study to rank the features in risk prediction.

Experiments
As mentioned earlier, two different types of investigations were carried out: risk classification and the outcome prediction for the high-risk patient.Five-fold cross-validation was performed in this study.Therefore, 80 % of the data was used for training and 20 % for testing in each fold.Finally, a weighted average of the five folds was calculated.The number of trainings, test Chest X-ray images, and clinical data used in the two experiments are listed in Table 2 All of the experiments in this study were conducted using the PyTorch library and Python 3.7 on an Intel® Xeon® CPU E5-2697 v4 running at 2.30GHz and the computer has 64 GB RAM and a 16 GB NVIDIA GeForce GTX 1080 GPU.

Development and Internal Validation of Stacking Classification Model
Eight machine learning models such as Random Forest [53], Support Vector Machine (SVM) [54], K-nearest neighbor (KNN) [55], Adaboost [56], XGBoost [56], Gradient boosting, linear discriminant analysis (LDA) [57], and Logistic regression [58] were used to reduce features after PCA from CXR images and top-ranked clinical features, individually and in combination, for risk and death prediction.The three best-performing models were chosen as base learner models (M1, M2, M3) to develop the stacking model, and logistic regression classifier was then utilized in the second phase for training the meta learner model (Ml), resulting in separate performance matrices based on the final prediction.

Experiment-01: Risk stratification using CXR Image and Clinical Data
In this experiment, we investigated three different experiments to predict the risk of COVID-19 patients.The first one is conducted on CXR image features, while the second one is carried on Clinical features, and finally, the combined features from both modalities are used to stratify the risk.

A. Binary Classification (Low vs High Risk) using CXR Images
The ChexNet model was used to extract features from CXR and then PCA was used to reduce the dimensionality of the CXR features.Then, using reduced feature components and five-fold cross-validation, eight alternative ML classifiers were developed to determine which models performed well in classifying low and high-risk patients.The stacking model was built using the top three base models and a meta-model and the performance of the stacking technique for the CXR image alone is reported.

B. Binary Classification (Low vs High Risk) using Clinical Data
Using five-fold cross-validation, Top-5 features (LDH, O2 percentage, Age, WBC, and CRP) identified in the previous stage were tested on eight different ML classifiers to determine which models performed best in classifying low and high-risk patients.A stacking model was trained using the top-performing three algorithms as base models to train a meta learner and the performance of the meta learner and base models are reported.

C. Binary Classification (Low vs High Risk) using CXR Images and Clinical Data
It was important to investigate how well reduced CXR feature components and top-ranked clinical features performed in classifying low and high-risk patients using different ML classifiers for five-fold crossvalidation.This experiment will reflect the strength of the multimodal approach proposed in this study compared to hundreds of approaches published on CXR alone or tens of approaches published on clinical data alone.

Experiment-02: Death Probability Prediction for High-risk Patients
We studied three investigations to predict the death outcome of high-risk COVID-19 patients, as shown in Experiment-01.The first one is conducted on CXR image features, while the second one is carried on Clinical features, and finally, the combined features from both modalities are used to stratify the dead and survived patients.

A. Binary Classification (Survival vs Death) using CXR Images
The features extracted from the CXR images using ChexNet were dimensionality reduced using PCA and used to train eight different ML classifiers to see which models performed well in predicting the mortality outcome of high-risk patients using 5-fold cross-validation.Among the eight models, the best performing three models were used to train the stacking model and the results of base and stacking models are reported.

B. Binary Classification (Survival vs Death) using Clinical Data
Top-5 clinical features (LDH, O2 percentage, Age, WBC, and CRP) were tested on eight different ML classifiers to determine which models performed best in predicting the mortality outcome among high-risk patients.A stacking model was trained using the top-performing three algorithms as a base model to train a meta learner and the performance of the meta learner and base models are reported.

C. Binary Classification (Survival vs Death) using CXR Images and Clinical Data
As a multimodal approach, we have investigated the efficacy of reduced CXR features and top-ranked clinical features to predict the mortality outcome of high-risk patients using five-fold cross-validation using same eight models.Then the Top-3 best performing models were used to train the Stacking ML model and the results for base models and stacking model were reported.

Development and Validation of Logistic Regression-based Nomogram
Nomograms are a popular graphical scoring technique for comprehending statistical models into a single event probability estimate [59].This can be developed using different ML classifier, e.g., Logistic regression classifier.Logistic regression uses multiple independent predictors (x) to predict outcome (y), which are made linearly related to outcome.Event probability (P) can be calculated using linear prediction and results can be reported.A logistic regression-based nomogram was developed for high-risk patients to stratify their outcomes of survival and death.Using the integrated features from CXR and clinical data and the base learners' prediction, a nomogram using logistic regression was constructed.Furthermore, calibration curves for model construction and validation were plotted to compare the outcomes of the projected and actual probability of death for high-risk patients.Furthermore, we used decision curve analysis to finalize the ranges of threshold probabilities within the clinically useful range of the nomograms.

Performance Metrics
The area under the curve (AUC) from the receiver operating characteristic (ROC) along with Sensitivity, Precision, Accuracy, Specificity, and F1-Score were used to assess the performance of different classifiers.
Because this study used five-fold cross-validation, the results are based on the complete dataset (five test foldconcatenated).Because different classes had variable numbers of instances, weighted metrics per class and overall accuracy were reported.As a metric for evaluation, the area under the curve (AUC) was considered.
() =  _  _ +  _ (6) ℎ   =       ℎ Where TP is true positive meaning correcting detecting the actual class, TN is true negative meaning misclassifying the actual class, FP is false positive meaning correcting detecting the other class and FN is false negative meaning misclassifying the other class.

Best Features and Their Combination Selection
Thirty-four statistically significant features used to select the top ranked 10 features using the random forest feature ranking technique (Figure 5).Table 3 illustrates the results of using multiple classifiers to test these top-ranked 10 features to determine the best-performing feature combinations.The Gradient Boosting classifier surpasses different networks in binary classification (low-vs.high-risks) when employing the topranked 5 features.Using only the Top-5 characteristics, Gradient Boosting yields overall accuracy, weighted sensitivity, precision, specificity, and F1-score of 82.91 percent, 82.91 percent, 82.87 percent, 82.91 percent, and 82.87 percent, respectively (LDH, O2 percentage, WBC, Age, and CRP).Among the Top-10 features, it was also very important to assess the most suitable parameters for the early prediction of high-risk COVID-19 patients.

Risk Prediction of COVID-19 Patients
In this section, the results of three different experiments to predict low or high-risk COVID-19 patients were reported.The performance of different ML models for CXR images, then using clinical data were reported separately and in combination.Each of these results is based on five-fold cross-validation.

Performance Analysis using CXR Images
The gradient boosting classifier was the best performing classifier for stratifying the low-and high-risk COVID-19 patients.It achieves precision, sensitivity, and F1 scores of 78.41 %, 78.48 %, and 78.41 %, respectively.The stacking model was built using the top three classifiers such as Random Forest, KNN, and Gradient Boosting.The stacking model produces slightly better performance with precision, sensitivity, and F1 scores of 79.5 %, 79.53 %, and 79.54 %, respectively.

Performance Analysis using Clinical Data
The gradient boosting classifier outperforms other classifiers in binary classification with precision, sensitivity, and F1 scores of 82.81 %, 82.8 %, and 82.81 %, respectively.The stacking model was trained using the top three algorithms (Random Forest, Gradient Boosting, and XGBoost).A meta learner logistic regression classifier was used and outperformed the base model with precision, sensitivity, and F1 scores of 83.01 %, 83.87 %, and 83.01 %, respectively.Performance Analysis using both CXR images and Clinical Data On combined CXR features and clinical data, the gradient boosting classifier outperforms other classifiers with precision, sensitivity, and F1 scores of 88.81 %, 88.81 %, and 88.81 %, respectively.The stacking model was built using the top three algorithms (Gradient Boosting, LDA, and Random Forest) and it outperforms the base models and produces precision, sensitivity, and F1 scores of 89.03 %, 90.44 %, and 89.03 %, respectively.The stacking model demonstrated roughly a 6% improvement using combined CXR features and top-ranked clinical features.The comparison between different classifiers using different metrics with a 95% confidence interval in the prediction of low or high-risk patients using CXR features and clinical data separately and in combination are shown in Table 4.In Figure 6, it can be seen that combined CXR image features and clinical top-ranked features outperformed individual modality with an AUC of 91.5%.The AUC values for CXR image features and clinical top-ranked features individually using the stacking model produced 82.3% and 85% of AUC, respectively.

Death Probability Prediction for High-risk Patients
In this section, the results of three different experiments to predict the probability of death among the highrisk COVID-19 patients were reported.The five-fold performance of different ML models for CXR images, then using clinical data were reported separately.

Performance Analysis with CXR Images
Random Forest classifier outperforms the other 7 classifiers in classifying the dead and survived COVID-19 patients with precision, sensitivity, and F1 scores of 84.83 %, 85.02 %, and 84.83 %, respectively.The stacking model was built using the top three methods (Random Forest, Extra Tree, and Gradient Boosting) and produces precision, sensitivity, and F1 scores of 86.35 %, 83.22 %, and 86.35 %, respectively.

Performance Analysis with Clinical Data
The gradient boosting model outperforms the other seven classifiers in stratifying the survival and dead patients with precision, sensitivity, and F1 scores of 89.14 %, 89.86 %, and 89.14 %, respectively.The stacking model was trained using the top three models (Random Forest, XGBoost, and Extra Tree).The stacking model beat previous base models, achieving 91.2 % precision, 91.25 % sensitivity, and 91.2 % F1 scores, respectively.

Performance Analysis using both CXR Images and Clinical Data
Random Forest classifier outperforms other models with precision, sensitivity, and F1 scores of 91.76 %, 91.86 %, and 91.76 %, respectively.The stacking machine learning model was trained using Random Forest, Extra Tree, and Gradient Boosting and it outperforms the base model with precision, sensitivity, and F1 scores of 92.88 %, 93.37 %, and 92.88 %, respectively.In terms of all the different performance matrics, the performance of the stacking model improved by ~ 6% when using both reduced CXR features and clinical top features, refer Table 5.In Figure 7, it also can be visible that combined CXR image features and clinical top-ranked features outperformed individual modalities with an AUC of 92.8%.The reduced CXR image features and clinical topranked features using the stacking model individually produce an AUC of 88.4% and 91.1%, respectively.

Stacking ML Based Nomogram
Because the Logistic regression meta learner performs best in the classification of survival and death patients, a Nomogram that leverages the probability scores of the three best models (Random Forest (M1), Extra Tree (M2), and Gradient Boosting (M3)) was created to accurately estimate the survival and death probabilities of the high-risk group.The link between the probability scores of these base learner models and the likelihood of death in high-risk patients was investigated using multivariate logistic regression analysis (Table 6).A common method for identifying the significant features is to use the z-value, which is calculated using regression coefficient and standard error.The independent variable becomes significant when the z-value is high.It can be seen from Table 6 that Extra Tree (M2) is not a very good predictor of COVID-19 individuals out of three probability scores, although Random Forest (M1) and Gradient Boosting (M3) are good predictors.The P-value can be used to determine significant variable if p < 0.05, X-variables can have a significant connection with Y-variables.It is evident also from the p-value that the Extra Tree model is not a strong predictor.However, it was observed that the model performance was slightly reduced by stacking two models instead of three.Therefore, the Nomogram is created using three models.The nomogram has 6 rows, spanning from 1 to 3, reflecting the incorporated variables, as shown in Figure 7. "Points axis" yielded a score for each variable in the death or survived high-risk group.The scores were calculated by adding the points from the three factors (row 4), and the final score was displayed in row 6.To calculate the chance of a patient dying, a line is drawn from the "Total Score" axis to the "Prob axis" (row 5).
Alternatively, the following formula can be used to calculate the nomogram score:    Figure 9 demonstrates that each predictor model's net benefit was positive (threshold<0.95),indicating that each predictor contributed to the outcome prediction.The entire model, in particular, provided the best results, necessitating the use of three base models as predictors in the Stacking model.

Performance Evaluation of the Model
We compared the actual death to the expected death among high-risk individuals using the Nomogram score.The proportions of death outcomes in the training set were 91.9 percent (125/136) for the death group and 8.1 percent (11/136) for the survived group, as shown in Table 7(A), whereas the proportions of death outcomes in the test set were 91.18 percent (31/34) for the death group and 8.82 percent (3/34) for the survived group (Table 7(B)).Actual death rates were significantly different between the two groups (p<0.001).As a result, this scoring technique can be useful to forecast the patient outcomes.The backend application is developed in python using Flask.Flask is a robust backend application framework for python.The cloud application is deployed to an Apache 2.0 HTTP server running on an Ubuntu 20.01 LTS Google Computation Engine (GCE).To optimize server cost a minimum configuration GCE instance is hired.
The GCE server have a 4 core Intel Xenon processor unit with 8 GB DDR4 memory and a 100GB balanced persistent storage.To accommodate the computation intensive ML models in such a resource constraint environment, kernel level configurations are modified for the operating system.Such configurations include enabling non-threaded pre-forking for the Apache webserver, to facilitate additional memory for the Tensorflow processes.This web-application was created with Flutter which is based on Google's Dart programming language.In the prototype system, the radiologists/clinicians/users will first enter demographic information, then system will ask the user to upload CXR images as well as four biomarkers: LDH, O2 percentage, WBC, and CRP.This will be uploaded to the server, where it will be pre-processed and applied to BIO-CXRNET model to determine if the user is a low or high-risk patient (Figure 10).The data will be analyzed by the AI backend, and an answer will be displayed on the screen.The application will both display and save the findings in an SQLite-based local database.In summary, the application can assist in promptly assessing the severity risk of COVID patients using a limited number of blood indicators, hence reducing the burden on the healthcare system.

IV. Discussion
This work presents a multimodal system for predicting COVID-19 positive patients' risk and consequently stratify the potential outcome of high-risk patients.The performance of both of the experiments were analyzed separately and in combination using CXR image and clinical data.It was observed from both the experiments that the multimodal approach outperforms individual modality.For risk group stratification among COVID-19 patients, CXR and clinical features combined had an accuracy of 89.03% in comparison to 80.11% and

Clinical
Data 86.01%for CXR and clinical features, respectively.Furthermore, in case of outcome prediction for the highrisk patients, multimodal technique outperformed individual modality with an accuracy of 92.3% whereas CXR images and clinical data individually produced an accuracy of 89.5% and 90.11%, respectively.The results reported in this work has superior performance compared to some of the state-of-the-art performance reported in the literature, as shown in Table 8.In our previous research [63] on severe acute respiratory syndrome (SARS) [64], Middle East respiratory syndrome (MERS) [65], and COVID-19 [66], we found that older age was a predictor of poor outcomes in COVID-19 patients.Because LDH indicates tissue/cell death, it is a common sign of tissue/cell damage.Serum LDH has been established as a key biomarker for idiopathic pulmonary fibrosis activity and severity.According to Yan et al. [60] [], the increase in LDH is considered in patients with severe pulmonary interstitial disease and is one of the most important prognostic markers of lung injury.The increase in LDH levels in COVID-19 patients who are severely ill implies an increase in lung damage severity.
CRP testing at admission is associated with the prediction of short-term mortality related to COVID-19-related illnesses, according to research conducted by Lu et al. [67].CRP is synthesized by hepatocytes that are activated by certain cytokines originating from activated leukocytes, such as those produced by infections, inflammations, or an injury to the tissue.CRP is synthesized by hepatocytes that are activated by certain cytokines originating from activated leukocytes, such as those produced by infections, inflammations, or an injury to the tissue.Our study found that increased CRP levels at admission were associated with increased mortality risk among patients with COVID-19.These findings indicated a severe inflammation or possibly a secondary infection had developed in these patients, and empirical antibiotic treatment might be required.The increase of CRP, an important marker for poor prognosis in acute respiratory distress syndrome reflects a persistent state of inflammation [68,69].COVID-19 patients have large gray-white lesions in their as the result of this persistent inflammatory response [70].
The five biomarkers identified in our investigation were associated with inflammation, immunology, and coagulation function, all of which may have a role in COVID-19 pathogenesis, based on the previous studies.We predicted that the inflammatory response to Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection is fundamental to COVID-19 pathogenesis, and that dysregulation of the immune and/or coagulation systems leads to severe clinical outcomes, including Acute respiratory distress syndrome (ARDS), coagulopathy, and septic shock, among others.Patients who died showed lower WBC and O2 percentages, as well as higher age, CRP, and LDH values than survivors.COVID-19 patients with a high mortality risk may benefit from early care based on a complete assessment of the inflammatory response, immunological dysfunction, and coagulopathy.As expected clinical information along with the Chest X-ray images helps in the reliable detection of COVID-19 severity and mortality risk.
In addition, our nomogram can be used in a variety of therapeutic scenarios.It outperforms other models proposed in the literature, to the best of our knowledge.Furthermore, the nomogram's score served as a quantitative tool for identifying patients with a high mortality risk upon admission and guiding clinical therapy.COVID-19 individuals were assigned to risk categories based on their hospital admission data.Isolation and treatment of low-risk cases should be done in isolation centers.Survivors from high-risk categories should be admitted to a hospital with an isolation unit to receive complete care.The high-risk group requires close monitoring and is referred to the ICU for intense treatment and critical assistance.

V. Conclusion
A multimodal method was proposed that used a unique machine learning architecture to predict severity and mortality risk in COVID-19 patients utilizing both CXR images and clinical data.The suggested architecture uses CXR images and only five parameters: LDH, O2 percentage, Age, WBC, and CRP, which shows outstanding results for detecting low and high-risk COVID-19 positive individuals with very high sensitivity.Furthermore, the proposed nomogram-based approach accurately forecasts the likelihood of death in individuals at high risk.Our nomogram for predicting the prognosis of COVID-19 patient's demonstrated good discrimination and based on various risk indicators.Since the model uses CXR image and clinical parameters which can counteract the criticisms of the clinicians on using only radiographic images for prognostic purpose.This model can identify the potential risk of the patient at admission which can significantly help is hospital resource management.Although the study has used data from initial variants but the clinical biomarkers identified in this work are supported by a large pool of clinical studies done on other variants and therefore, we expect this model can be equally useful in Omicron and other future variants, which can emerge in the coming winter.As a result, doctors could use this technique to make a quick and fair judgment to optimize patient stratification management and possibly minimize mortality rates.However, this quantitative tool should be validated in large-scale multicenter and multi-country prospective study to demonstrate it usability further in clinical setting.

Figure 1 :
Figure 1: Overview of the methodology.

Figure 2 :
Figure 2: Chest X-ray sample images for COVID-19 (A) Low-risk patients, (B) High-risk patients with survival outcomes, and (C) High-risk patients with death outcomes.

Figure 3 :
Figure 3: Samples X-ray images from the study dataset (A), generated masks by the best performing densenet121 FPN model (B) and corresponding segmented lung (C).

Figure 5 :
Figure 5: Top ten features selected using the random forest feature selection technique.

Figure 6 :
Figure 6: ROC curves for risk prediction of COVID-19 patients with single and multi-modal data using the stacking ML model.Performance Analysis using both CXR Images and Clinical DataRandom Forest classifier outperforms other models with precision, sensitivity, and F1 scores of 91.76 %, 91.86 %, and 91.76 %, respectively.The stacking machine learning model was trained using Random Forest, Extra Tree, and Gradient Boosting and it outperforms the base model with precision, sensitivity, and F1 scores of 92.88 %, 93.37 %, and 92.88 %, respectively.In terms of all the different performance matrics, the performance of the stacking model improved by ~ 6% when using both reduced CXR features and clinical top features, refer Table5.

Figure 7 :
Figure 7: ROC curves for outcome prediction of high-risk patients with single and multi-modal data using the stacking ML model.

Figure 7
Figure 7 also shows the Nomogram scores for both survived and death classes.It was found that 50 % cutoffs of classification probability represent a Nomogram score of 4.8 or probability of 0.5, which stratifies the classes.

Figure 8
Figure8shows the calibration plot for both internal and external validation.It is evident from Figure8that each calibration curve is very close to the diagonal line reflecting a reliable model.The AUC values for internal and external validation are 98.1% and 93.8 %, respectively, which is also reflecting the superior performance of the proposed model.Figure9demonstrates that each predictor model's net benefit was positive (threshold<0.95),indicating that each predictor contributed to the outcome prediction.The entire model, in particular, provided the best results, necessitating the use of three base models as predictors in the Stacking model.

Figure 9 :
Figure 9: Decision curves analysis comparing different models to predict the death probability of patients with high-risk COVID-19.

Table 1 :
Summary of statistical characteristic of the study patients

Table 2 .
. Details of the dataset used for training, validation, and testing.

Table 4 :
Comparison of performance metrics for risk prediction using different ML models and approaches (single mode and multimode)

Table 5 :
Comparison of performance metrics for death prediction using different ML models and approaches (single mode and multimode)

Table 6 :
Summary of logistic regression analysis

Table 7 :
Performance evaluation of the model in the training cohort (A) and testing cohort (B) using Fisher's

Table 8 :
Comparison with state of the art works in the literature