Identification of Clinical Features Associated with Mortality in COVID-19 Patients

Understanding clinical features and risk factors associated with COVID-19 mortality is needed to early identify critically ill patients, initiate treatments and prevent mortality. A retrospective study on COVID-19 patients referred to a tertiary hospital in Iran between March and November 2020 was conducted. COVID-19-related mortality and its association with clinical features including headache, chest pain, symptoms on computerized tomography (CT), hospitalization, time to infection, history of neurological disorders, having a single or multiple risk factors, fever, myalgia, dizziness, seizure, abdominal pain, nausea, vomiting, diarrhoea and anorexia were investigated. Based on the investigation outcome, decision tree and dimension reduction algorithms were used to identify the aforementioned risk factors. Of the 3008 patients (mean age 59.3 ± 18.7 years, 44% women) with COVID-19, 373 died. There was a significant association between COVID-19 mortality and old age, headache, chest pain, low respiratory rate, oxygen saturation < 93%, need for a mechanical ventilator, having symptoms on CT, hospitalization, time to infection, neurological disorders, cardiovascular diseases and having a risk factor or multiple risk factors. In contrast, there was no significant association between mortality and gender, fever, myalgia, dizziness, seizure, abdominal pain, nausea, vomiting, diarrhoea and anorexia. Our results might help identify early symptoms related to COVID-19 and better manage patients according to the extracted decision tree. The proposed ML models identified a number of clinical features and risk factors associated with mortality in COVID-19 patients. These models if implemented in a clinical setting might help to early identify patients needing medical attention and care. However, more studies are needed to confirm these findings.

devised a useful prediction model of COVID-19 mortality utilizing unbiased computational techniques and detected the most predictive clinical features. Their machine learning (ML) framework was mainly based on three clinical features: minimum oxygen saturation throughout patients' medical encounters, age and type of patient encounter. Their COVID-19 mortality prediction model exhibited a competitive accuracy. Although a number of studies have explored the association of mortality with clinical features of COVID-19, those studies did not provide a comprehensive list of clinical features associated with COVID-19 mortality. In addition, most of the predictive COVID-19 ML models were based on Chinese data; hence, it might not be relevant in other parts of the world. In this study, we tried to cover these two weaknesses of previous researches. We aimed to determine the set of clinical features associated with COVID-19 mortality in Iranian cases for the first-time using ML approaches.

Methods
In this section, the data collection process, the employed ML model and conducted statistical tests are presented. C4.5 decision tree is used as the ML model to predict whether a COVID-19 patient survives or not given his/her symptoms and medical conditions.

Study Settings, Population and Recruitments
We collected medical reports of all COVID-19 patients (n = 3008) who have been referred to Semnan hospital in Iran between March 2020 to November 2020. Data on sociodemographic features and clinical factors such as gender, age, number of months of infection and hospitalization, inpatient department, fever, myalgia, seizures and dizziness were investigated to determine their effects on the mortality of COVID-19 patients. All of the investigated features are categorical except age, blood pressure and oxygen saturation which are continuous. The dataset collection process has been done under the direct supervision of registered medical experts. Considering that data collection is error prone, samples with suspicious values were corrected if possible and discarded otherwise.

ML Models
In this research, C4.5 decision tree [35] is used for classification of patients. The C4.5 algorithm makes decisions using a set of training tree data. To do this, to create each node of the decision tree, C4.5 algorithm selects one of the features of training data that can more effectively partition the training samples. This selection is made based on the concept of entropy. Any attribute that can classify samples into purer categories is selected sooner. Then, the train dataset is categorized according to that attribute, and several branches are created. This process is repeated in each branch. If all the instances in the subcategory belong to a class, a leaf node is created for the decision tree and the class of those instances is specified, but if all the instances do not belong to a class and a new attribute cannot be selected for any reason, C4.5 creates a decision node using the expected value of the class. In addition, some dimension reduction algorithms such as PCA [36], PSL [37] and t-SNE [38] were used to show the samples according to important features. Dimension reduction is one of the major tasks for multivariate analysis. PCA as a linear dimension reduction algorithm is applied without considering the correlation between the dependent and the independent variables. However, PLS is applied based on the correlation. On the other hand, t-SNE algorithm estimates a similarity measure between pairs of samples in the high and the low dimensional spaces.

Ethics Approval
Local ethical committee of the Semnan University of Medical Sciences approved this research. The patients were informed about this research aims, and written consent was obtained before data collection.

Statistical and ML Analysis
We analysed the dataset features using MATLAB 2018b software. To determine difference between the two patient groups (i.e. alive and dead), Wilcoxon rank-sum test [39] and Fisher's exact test [40] were used for continuous and categorical data, respectively. The statistical significance of the two tests was set to P ≤ 0.05. In C4.5, the information gain was employed as the criterion to determine the attributes to be used as tree nodes. At each tree node, the attributes with minimum entropy were selected to form the children of that node. The number of children is equal to the number of possible values that the selected attribute can have. The size of each node N i is the number of examples in the sub-tree that has N i as its root. Only those nodes were split whose size was greater than or equal to the minimal size for split parameter. In our experiments, the split parameter was set to 4. For C4.5, the size of each leaf node (the number of examples in it) must be set as well. Finally, the last parameter that must be specified is the minimal gain. Only the nodes with gain greater than the minimal gain were considered for split operation. Increasing the minimal gain leads to fewer splits and smaller decision tree.

Results
Of the 3008 patients with COVID-19, 94.5% (2844) were of Iranian nationality and 5.5% (164 cases) were Afghan nationals. 56% were men, and 44% were women with an age average (± SD) 59.3 ± 18.7 years (1-100 years). In Fig. 1, the histogram of COVID-19 casualties for different age intervals has been shown. Of the patients who were referred to the hospital during this period, 18.5% were required to be admitted to the intensive care unit and the rest to the isolated and normal wards. Three hundred seventy-three of these 3008 cases were deceased. Three hundred eighty-seven patients (12.9%) with COVID-19 were in contact with the infected person, and 2621 patients (87.1%) declared any contact with the infected person. About 70.4% of patients referred to hospital personally, and 653 (21.7%) of them were conveyed to the hospital by pre-hospital emergency, 199 (6.6%) by private ambulance and 38 (1.3%) by ambulances from other centres.
According to these data, the prevalence of COVID-19 infection was high in March 2020 and then had the lowest incidence in May and June and finally reached its peak in October and was associated with the fewer incidence in November (Fig. 2). Table 1 shows the effect of different features on the mortality rate. Mortality was not significantly different between men (1684 cases) and women (1324 cases). There was a significant correlation between mortality and age of patients (P < 0.001), infection time (P < 0.001) and the hospitalization ward (isolated ward, intensive care unit, normal ward) (P < 0.001). Symptoms such as fever, myalgia, dizziness, seizure, abdominal pain, nausea, vomiting, diarrhoea and anorexia were occurred without having mortality related to COVID-19 (P > 0.05). There was a significant association between mortality and headache in infected patients (P < 0.011). Chest pain was also associated significantly with COVID-19-related mortality (P < 0.045). Decreased level of consciousness was also significantly associated with COVID-19-related mortality (P < 0.0001). Respiratory distress, oxygen saturation less than 93%, lower respiratory rate and need for mechanical ventilation were associated with COVID-19-related mortality (P < 0.004, P < 0.001, P < 0.001 and P < 0.001, respectively).

The Effect of Early Symptoms on the Outcome of Patients' Deaths
Opium addiction, smoking status, pregnancy, diabetes mellitus, underlying cancer, liver disease, lung disease, asthma, kidney disease, chronic haematological diseases, other chronic diseases and receiving immunosuppressive medicines had no association with COVID-19-related mortality. Underlying cardiovascular disease and neurological diseases were associated with COVID-19-related mortality (P < 0.023, P < 0.003 and P < 0.012, respectively). The presence of CT scan symptoms was significantly related to mortality in COVID-19 cases (P < 0.001). Having a risk factor was significantly correlated with mortality due to COVID-19 (P < 0.003). Having multiple risk factors was significantly correlated with mortality of COVID-19 (P < 0.002). The statistical test results presented above reveal the symptoms with significant relation to COVID-19 mortality. These symptoms can be used as features to form a decision tree for COVID-19 diagnosis. An example of these types of decision trees is shown in Fig. 3. The results of evaluating the prepared decision tree on our dataset are available in Table 2. The evaluation was done based on accuracy [41], sensitivity [42], specificity [43], precision [44] and F1-score [45]. In Fig. 4, the patients are shown according to their important features extracted by PCA, PSL and t-SNE algorithms. According to this figure, although PCA has better performance, it is clear that the cases are not separable well.

Discussion
The main findings of our study are the significant association of mortality due to COVID-19 with factors such as age, headache, chest pain, low respiratory rate, oxygen saturation less than 93%, need to a mechanical ventilator, having symptoms on CT, hospitalization in wards and time to infection. Besides, neurological disorders, cardiovascular diseases and having risk factor(s) were associated with COVID-19   Fig. 3 An example of decision tree mortality. Interestingly, there was no significant association between mortality and gender, fever, myalgia, dizziness, seizure, abdominal pain, nausea, vomiting, diarrhoea and anorexia. As another contribution, this paper is the first to investigate the association of history of neurological disorders, having risk factor(s), dizziness, seizure and abdominal pain with COVID-19-related mortality. The significant association between age and COVID-19-related mortality in our study is in line with previous studies conducted by Zhou et al. [46], Pettit et al. [47], Chen et al. [48] and Iftime et al. [49] and in contrast to De Smet et al. [50], Sun et al. [51] and Li et al. [52]. Immune impairment and the enhanced possibility of developing cardiovascular and respiratory diseases would be the joint linkage between old age and COVID-19-related mortality [53,54]. The observed association between the underlying cardiovascular diseases and COVID-19-related mortality in our study was in line with Chen et al. [55], Soares et al. [56] and Ruan et al. [57], but was contrary to Iftimie et al. [58], Li et al. [59] and Ciardullo et al. [60] findings. We found underlying high blood pressure to be associated with COVID-19 mortality, which is in line with Li et al. [59] finding and is in contrast with Rawl et al. [61], Pei et al. [62], Sun et al. [51] and Ciardullo et al. [60] findings. Hospitalization in wards was associated with COVID-19-related mortality, parallel with Chen et al. [59] findings, who found a relationship between ICU admission and mortality. The association between the need for mechanical ventilation and COVID-19-related mortality is in line with Chen et al. [59] and Zhou et al. [46] findings. The association between low oxygen saturation and low respiratory rate with mortality was in contrast with Sun et al. [51] findings.
In our previous study, anorexia, dry cough, anosmia and history of cancer were associated with COVID-19-related mortality [63], but in this study, we observed no relationship between mortality of COVID-19 and cancer that may be due to different populations of the study: two other provinces from one country. Anorexia showed a significant positive relationship with COVID-19-related mortality by Rawl et al. [61]. Regarding comorbidities, finding no significant association between cancer and COVID-19-related mortality is in line with Lee et al. [64] findings but is in contrast with Iftimie et al. [49], Mehta et al. [65], Dai et al. [66], Westblade et al. [67], Melo et al. [68] and Rüthrich et al. [69] findings. Different demographic features could explain this discrepancy. Finding no association between gender and COVID-19related mortality is the same as Ruan et al. [57], Mehta et al. [65] and Sun et al. [51]. Absence of association between fever and COVID-19-related mortality in our study is the same as our previous research [63], but it contrasts with the findings of Iftime et al. [49]. Myalgia, diarrhoea, nausea and vomiting were not predictors of mortality in our cohort, which contrast with Zhou et al. [46] findings. Some of the typical clinical characteristics of COVID-19 patients with mortality was summarized in Table 3.    The most important strength of this research is investigating impact of some new features on mortality rate of COVID-19 patients. Another important strength of this research is the large amount of data used. However, our results should be interpreted with the following weaknesses. The patients were recruited from a specific region, and our results might not apply in other countries as factors associated with mortality may differ in various regions [70]. Future research is necessary to investigate mortality rate of COVID-19 in patients with heart or kidney diseases with long-term follow-ups.

Conclusion
In this research, we investigated the effect of some of the risk factors and symptoms of COVID-19 mortality rate for the first time. Our results show a significant association between mortality and risk factors like old age, headache, chest pain, low respiratory rate, oxygen saturation less than 93%, need to a mechanical ventilator, having symptoms on CT, hospitalization in wards, time to infection, neurological disorders, cardiovascular diseases and having a risk factor or multiple risk factors. In contrast, there is no significant association between mortality and gender, fever, myalgia, dizziness, seizure, abdominal pain, nausea, vomiting, diarrhoea and anorexia. More studies are needed to confirm these findings.