Introduction

The outbreak of disease caused by the novel severe respiratory syndrome coronavirus 2 (SARS-CoV-2) was declared a pandemic by the World Health Organization on March 11, 2020 [1]. So far, the factors that put individuals at a higher risk of poor outcomes from COVID-19 are poorly understood. Previously reported studies have weaknesses in their design, [2,3,4,5,6] including relatively small sample sizes and analysis of single-center data, and most only consider hospitalized patients, without providing data on the general population [7], or more frail groups, in particular, those institutionalized in nursing homes. Furthermore, these studies have focused on disease severity, rather than mortality.

Although it seems clear that older patients with chronic medical conditions are at a higher risk of being infected [8], it is necessary to verify the clinical characteristics and case fatality of COVID-19 in countries with different demographic characteristics to generalize these findings. Furthermore, some drugs received as concomitant treatments for chronic medical conditions at baseline may be affecting the course of the disease. Early identification of people at the highest risk of deteriorating would help physicians detect vulnerable groups, to administer preventive therapies or vaccinations and minimize further spread of the infection. Effective patient risk stratification is essential to optimize care and the use of healthcare resources. We need "prediction models" for the general population, to detect target groups to guide medical staff in triaging patients for allocating finite healthcare resources [9].

The aim of this study was to identify factors associated with risk of death among patients diagnosed with COVID-19 in the Basque Country and thereby create and validate prediction scores.

Methods

This is a retrospective study of a cohort of patients diagnosed with COVID-19 in the Basque Country based on data from the electronic database and health records of the Basque Health Service-Osakidetza. This Spanish region has a population of 2,178 million people, the vast majority of whom are entitled to healthcare under Osakidetza. The Basque Health System is divided into 13 integrated healthcare organizations (IHOs), gathering all primary and hospital care resources in given areas under the same administrative management.

All patients included in this study were residents in the Basque Country with SARS-CoV-2 infection that was laboratory-confirmed by a positive result on the reverse transcriptase–polymerase chain reaction assay for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) or a positive IgM or IgG antibody test performed due to symptoms suggestive of the disease or having had contact with a positive case, from February to May 22, 2020. No patients were excluded. The study protocol was approved by the Ethics Committee of the Basque Country (reference PI2020059). All patient data were kept confidential.

All data on patients under the care of Osakidetza are recorded in a unified electronic database. Analysts retrieved data from all positive cases detected during the study period including sociodemographic data (age, sex, place of residence), baseline comorbidities (all those considered in Charlson’s Comorbidity index [10], plus angina, arrhythmia, arterial hypertension, dyslipidemia, asthma, bronchiectasis, cystic fibrosis, interstitial lung disease, lymphoma, leukemia, coagulopathy, inflammatory bowel disease, gastrointestinal bleeding), baseline treatments (based on the Anatomical, Therapeutic, Chemical [ATC] classification system) [11,12,13], other background data concerning care provided in hospital or primary care settings including dates of hospital admission and discharge, and whether patients were admitted to an intensive care unit (ICU), and vital status. Comorbidities were identified based on the International Statistical Classification of Diseases and Related Health Problems (ICD) ICD-9 or 10 codes in patients’ records at baseline [14].

We grouped comorbidities in the following way: cardiovascular diseases (including myocardial infarction, angina, arrhythmia, congestive heart failure, and peripheral vascular disease); cerebrovascular disease, hemiplegia and/or paraplegia; arterial hypertension; dyslipidemia; dementia; respiratory disease (chronic obstructive pulmonary disease [COPD], bronchiectasis, chronic bronchial infection); asthma; liver disease (mild liver, moderate or severe liver disease); diabetes (diabetes with/without organ damage), kidney disease, cancer(malignant tumor, metastatic solid tumor, lymphoma); rheumatic disease; peptic ulcer; inflammatory bowel disease; and coagulopathies.

Regarding baseline medication, we selected drugs based on ATC codes [11]. Baseline treatment was defined as any drugs prescribed before the patient was diagnosed with SARS-CoV-2 infection and had no end date. Online Resource Table 1 provides summary information about the grouping of baseline treatments included in our study.

Data identifying people living in a nursing home were obtained from the Basque Health Department. The only outcome of the study was death during the study period. All patients were followed up until June 29, 2020.

Additionally, we studied the validity of our electronic database by comparing the data obtained from it for subset variables with the information provided by a group of trained reviewers who retrieved the same information, item by item, from the electronic health records of the same patients. The results of this sub-study are summarized in Online Resource Table 2. The rate of agreement was at least 93.33%, and in most cases, there was full agreement (100%).

Statistical analysis

As described in the literature on community-acquired pneumonia, we found differences in baseline characteristics and clinical course between patients from nursing homes and those from the general population. For this reason, we divided the sample (general population or nursing home residents) and performed all the analyses separately for these two groups.

Additionally, each sample was randomly divided into two subsamples, for derivation and validation purposes (60% and 40% of the entire sample, respectively).

Descriptive statistics were generated including frequency tables for categorical variables and means, standard deviations (SDs), medians and interquartile ranges (IQRs) for continuous variables. Patient characteristics were compared between the two subsamples (derivation vs. validation) using Chi-square or Fisher’s exact tests for categorical variables, and Student’s t test or nonparametric Wilcoxon tests for continuous variables.

Univariable logistic regression models were first built using the derivation samples to identify the significance of each potential risk factor. In these models, mortality was used as the dependent variable and all candidate predictive variables (described previously) as the independent variables. Then, independent variables with a p < 0.20 in the univariable analyses were considered potential independent variables in the multivariable analysis, for which multilevel analyses with generalized estimated equations were performed considering the IHO. Potential interactions between variables were also examined. In the final models, only factors with p < 0.05 were retained. Odds ratios (ORs) and 95% confidence intervals (CIs) were calculated. The predictive accuracy of the model was determined by calculating areas under the ROC curve (AUCs) [15].

To develop each predictive risk score, we first assigned a weight to each risk factor in relation to each β parameter based on the multilevel models. Then, we summed the weights of each of the risk factors of a patient, higher scores indicating a greater likelihood of death. The predictive accuracy of the mortality risk score was assessed using the AUC [15], in both derivation and validation samples (external validation). Furthermore, we sought to validate the risk scores by K-fold cross-validation [16, 17], which uses part of the available data of the derivation sample to fit the model, and a different part to test it (internal validation) [16].

Having developed the mortality risk score, we categorized the score into different levels of risk. For each category, the optimal thresholds on the continuous risk scores were determined with the catpredi function of the R package CatPredi using the genetic algorithm [18]. The performance of each risk classification was studied by comparing the mortality rate between categories and using the multilevel analyses with generalized estimated equations and AUCs, in both derivation and validation samples. Finally, Kaplan–Meier curves were constructed for each risk category, and comparisons were performed with the log-rank test.

All effects were considered significant at p < 0.05. All statistical analyses were performed using SAS for Windows, version 9.4 (SAS Institute, Carey, NC), and R© version 4.0.2.

Results

In this study, 18,768 people were classified as having COVID-19 based on test results. Overall, 5775 (30.77%) were admitted to a hospital during the study period and 448 (2.39%) were admitted to an ICU. This total included 3567 patients from nursing homes; their mean age was 84.15 (SD: 10.88) years and 69.67% were women. In this sample, 871 (24.42%) were admitted to a hospital during the study period and 15 (0.42%) to an ICU. In the general population sample (n = 15,201), the mean age was 53.77 (SD: 17.50), 59.55% were women; 4904 (32.26%) were admitted to a hospital during the study period and 433 (2.85%) to an ICU. Among the general population, 10,002 patients (65.80%) were positive at the pharyngeal swab test and 5199 patients (34.20%) at the IgG/IgM test; however, among nursing home residents, 71.32% were positive at the pharyngeal swab and 28.68% at the IgG/IgM test, being these differences statistically significant (p < 0.0001). As expected, patients in nursing homes presented a higher proportion of all kind of comorbidities, but mainly cardio and cerebrovascular, diabetes, COPD and dementia. Descriptive data of all patients are summarized in Online Resource Table 3.

The univariable analysis showed that for nursing home residents being older or male, having cardiovascular, cerebrovascular or kidney diseases, arterial hypertension or dementia, and being admitted to a hospital for COVID-19, as well as taking diuretics, cardiovascular medications or azithromycin at baseline were related to death during follow-up while taking NSAIDs, anticoagulants or lipid-lowering drugs were protective. In the case of patients from the general population, predictors of death identified in the univariable analysis were again older age, male sex, and being admitted to a hospital for COVID-19, and also having being admitted to a hospital in the previous month for any reason, most comorbidities, except asthma and inflammatory bowel disease, and all baseline treatments studied increased the risk, except the use of NSAIDs that was again protective (Table 1).

Table 1 Univariable analyses to predict mortality (derivation samples)

In the multivariable model of nursing home residents, predictors of mortality were being male, being older than 80 years, being admitted to a hospital for COVID-19, having cardiovascular disease, kidney disease or dementia at baseline, and the patient’s IHO. On the other hand, taking anticoagulants or lipid-lowering drugs at baseline was found to be protective. The AUC was 0.754 for this model overall and 0.742 not including the IHO in the model (Table 2).

Table 2 Multivariable analysis for prediction of death using multilevel analysis (derivation sample)

For patients from the general population, predictors of mortality were being older than 60 years (the risk increasing for older ages), being male, having been admitted to a hospital in the month before admission for any cause, and being admitted to a hospital for COVID-19, as well as any of the following comorbidities: cardiovascular disease, dementia, respiratory, liver disease, diabetes with organ damage, or cancer. Of the baseline medications considered, use of anticoagulants was again protective. The AUC for this model was 0.941 (Table 2).

Some of the variables that in the univariable analysis showed a predictive or protective relationship with mortality did not appear in the multivariable models or even reverse their relationship due to the confounding effect with the rest of the variables included in the final models. Additionally, patients with dementia and nursing home were admitted to the ICU to a lesser extent than the general population because of resource limitation. We performed a multivariate logistic regression to study the differences in ICU hospitalization according to whether the patients is nursing home resident or from the general population, and adjusting for dementia, sex and age, and we concluded that the risk of ICU hospitalization is significantly lower in nursing home residents, both among patients with dementia (OR (95% CI) = 0.10 (0.01–0.94), p = 0.0443) and without dementia (OR (95% CI) = 0.25 (0.15–0.44), p < 0.0001)).

For each category of a significant predictor, we assigned a weight and created risk scores for each sample. For the continuous risk scores, the AUCs in the nursing home sample were 0.754 in the derivation and 0.717 in the validation sample and in the general population sample were 0.941 in the derivation and 0.938 in the validation sample. Similar AUCs were obtained by K-fold cross validation (0.753 and 0.933, respectively) (Table 2).

Table 3 shows the division of these continuous risk scores into categories indicating a low to a very high risk of dying in the short term. The AUC for the risk categories for nursing home residents was 0.754 in the derivation and 0.714 in the validation sample, while for the general population, it was 0.916 in the derivation and 0.914 in the validation sample.

Table 3 Risk groups of short-term mortality in the derivation and validation samples of both groups using multilevel analysis

Figure 1 shows the Kaplan–Meier curves for risk categories in each sample. The log-rank test detected significant differences between all risk categories in both derivation and validation samples, except between those with scores of 4–5 and 6–10 in the nursing home resident validation sample (p = 0.612).

Fig. 1
figure 1

Kaplan–Meier curves of short-term mortality according to the risk classes in the derivation and validation samples. The log-rank test detected statistically significant differences between all risk classes in both derivation and validation samples, except between the risk classes with score 4–5 and score 6–10 in the validation sample of nursing home residents (p = 0.612)

Discussion

This study analyses outcomes in a sample of people with laboratory-confirmed SARS-CoV-2 infection from an entire region, separating nursing home residents from the general population, seeking to identify predictors of short-term mortality in a large cohort including patients who were and were not hospitalized. We have succeeded in identifying a set of comorbidities and baseline treatments related to death with a good predictive capacity for people from the general population.

To date, several meta-analyses have explored the relationship between COVID-19 and mortality [19, 20]. In all cases, a potential weakness is heterogeneity in the data, and all these analyses have focused on hospitalized patients, using laboratory test results, which were not uniformly selected and evaluated. Our study confirms previously published findings in that advanced age, male sex and comorbidities are associated with a higher risk of mortality. Additionally, the present study also identifies previous hospitalizations and some chronic baseline treatments as associated with death from COVID-19.

Most of the patients in the general population and nursing homes were elderly men with multiple comorbidities, in agreement with previous studies [2,3,4,5,6, 8]. It has been speculated that older patients with chronic diseases are more likely to die of COVID-19, because age-related alterations in immune function weaken the response to SARS-CoV-2 and hence worsen outcomes [21].

A higher proportion of men than women died and this could be partially explained by the stronger effect of older age among men. Circulating sex hormones in males and females might influence susceptibility to COVID-19 infection, as shown in a previous study, because they modulate the responses of adaptative and innate immunity [22].

Some recent meta-analyses have assessed the prevalence of comorbidities in patients with COVID-19; [2, 8, 19]; however, not all comorbidities have the same strength as a predictive risk factor for mortality. Our study showed that people with underlying cardiovascular disease or dementia are the two groups most likely to die.

The mechanisms underlying the association between cardiovascular disease and COVID-19 might be connected to infection-related demand ischemia that evolves into myocardial injury and dysfunction and there is evidence of direct viral infection of the myocardium [23]. Regarding dementia as one of the most powerful risk factors for death, our findings are consistent with other studies [24], and it is plausible that respiratory failure, frequent in COVID-19, masks the atypical symptoms in patients with dementia, leading to a failure to recognize the need for medical attention. Furthermore, in this sense patients with dementia and nursing home were admitted to the ICU to a lesser extent than general population because of resource limitation. On the other hand, the physical and cognitive impairment suffered by these patients with loneliness and lockdown worsens their prognosis, so the help of a geriatrician could be valuable.

Comorbidities such as COPD, diabetes, chronic liver disease and cancer were only significant in multivariate analyses for the general population. COPD, inflammation of the lung parenchyma and expiratory airflow limitation may cause respiratory failure, favoring virus superinfection with SARS-CoV-2 [25]. Diabetes is one of the most prevalent underlying conditions in COVID-19 patients. Although the mechanism is not entirely clear, it is suspected that the exacerbated proinflammatory cascade and impaired immune response are involved in this association [26,27,28]. Despite the low prevalence of chronic liver disease in our patients, consistent with the findings of other studies, this was also associated with higher mortality [29,30,31]. It seems that patients with this chronic disease are not at greater risk of acquiring the infection, but do have a poorer prognosis once infected.

Patients with cancer are more susceptible to infection because of their systemic immunosuppressive state caused by malignancy and also have a higher risk of mortality [32,33,34]. Patients with chronic kidney disease are also more vulnerable to COVID-19, and the already impaired kidney function may deteriorate [35, 36].

Hospital admission was associated with a poorer prognosis and higher mortality, as was admission in the previous month. This is clearly related to a more serious presentation of the infection, as reported for other infectious diseases [37].

Another important finding of our study was the protective role of some long-term medications, namely, anticoagulants and statins, as noted by other authors [38,39,40]. COVID-19 is an inflammatory and prothrombotic disease, and hence, chronic anticoagulation may well provide a real defense against thrombosis [40]. The potential beneficial effects of statins in COVID-19 could be due to their well-known anti-inflammatory properties and might regulate virus replication, exerting a protective effect [38, 39]. The use of statins and anticoagulants increase as age increases up to 89 years; from the age of 90 percentage decrease in both populations (general and nursing home). This decrease could be related to functional and cognitive deterioration of the elderly patients.

Routine prediction rules used in general wards and ICUs are not able to accurately assess the severity and/or mortality of COVID-19. New validated clinical predictions rules are required for patient stratification [9]. Our rules, based only on variables which are easily accessible and interpretable at the time of diagnosis, can identify seriously ill patients with COVID-19 who are at risk of death. Using data routinely collected in the medical record, we can distinguish patients at high risk (score > 11 for nursing home residents, or > 9 for the general population) from those at low risk. Patients at high risk should be hospitalized and closely monitored, while low-risk individuals could be treated as outpatients under surveillance. To our knowledge, this is the first such prediction rule that achieves this goal. It could help physicians to identify "high-risk groups". These groups should be prioritized if a vaccine becomes available, given the high mortality associated with COVID-19 in combination with these chronic conditions.

Strengths of this study include the large sample size, even for nursing home residents, homogeneity of the data, lack of reliance on data abstractors, avoiding potential bias, and development of predictive models following TRIPOD guidelines [41]. As for limitations of the study, we recognize that the analysis was restricted to a limited number of variables for which we were confident of the validity of the data and we have confirmed this validity. Furthermore, though all cases were COVID-19 positive, it was not verified that the cause of death was unequivocally the SARS-CoV-2 infection in all cases. Finally, this analysis focuses on one region, not the whole of our country or a larger geographical area, and therefore, other studies should be conducted to check the external validity of our models, and thereby, their generalizability.

In conclusion, this study provides for the first time two separate clinical prediction rules for COVID-19 positive individuals from the general population and from nursing homes, using factors related to mortality, that have fairly good predictive value and could be used by general practitioners as they require only basic patient information.