Introduction

In November 2019, a number of patients infected by a betacoronavirus appeared in Wuhan (China), and in December, an outbreak of pneumonia associated with a new type of CoV was reported. On the 12th of February 2020, the World Health Organization (WHO) classified the illness as COVID-19 [1]. The first known case appeared in Spain on the 31st of January, and a few weeks later, the virus was diagnosed for the first time in the Basque Country. In view of the rapid expansion of the virus [2], a State of Alarm was declared by the Spanish Government on the 13th of March, and a series of measures were introduced to restrict movement with the aim of containing the propagation of the virus [3].

Regarding risk factors, it is known that older aged patients, males, comorbidities such as arterial Hypertension (HTN) diabetes, cardiovascular, lung and cerebrovascular problems [4], immunosuppressant illnesses [5], high Body Mass Index (BMI) [6], or cancer [4] were all associated with poorer results or the seriousness of patients condition due to COVID-19. The WHO warned of higher mortality rates for patients who smoked [7] and patients who has smoked in the past caused a higher prediction of admission into Intensive care unit (ICU), although not leading to a higher mortality rates [8]. Various studies indicate that the uses of different drugs were associated with this prognosis, among them, Angiotensin-converting enzyme inhibitors (ACE inhibitors) and Angiotensin II receptor blockers type I (ARBs) [5].

Regarding the pneumococcal polysaccharide vaccine and influenza vaccination, a certain degree of protection has been observed given that the Toll-Like receptor is of fundamental importance in the binding of respiratory single-stranded ARN virus such as SARS-CoV-2, thereby helping to explain the cross-protection [9].

To offer an individualized prediction with regard to the risk of the development of the disease, numerous publications have analyzed the clinical characteristics and offered predictive models for the prognostic of patients with COVID-19 [10,11,12]. Such models are of great use when taking clinical decisions and guarantee both a better and earlier handling of cases by Primary attention (PA) or Hospital Emergency Services leading to improved levels of recovery and reducing mortality rates. Therefore, patients requiring hospitalization are lower, helping to avoid the saturation of the health system allowing for a more efficient use of the system and reducing the consequences of late patient referrals.

The Basque Health Service-Osakidetza has one, uniform system for the collection, storage, and use of data obtained during the clinical assistance provided, and through the present study, we wish to identify the epidemiological characteristics of infected patients in the general population to establish patterns and systems of early referrals from Primary Attention to the different hospitals and the activation of Emergency units and the improvement of clinical decisions by the Emergency services.

Methodology

This is a retrospective study of a cohort of patients diagnosed with COVID-19 in the Basque Country based on data from the electronic database and health records of the Basque Health Service-Osakidetza. This Spanish region has a population of 2178 million people, the vast majority of whom are entitled to healthcare under Osakidetza. The Basque Health System is divided into 13 integrated healthcare organizations (IHOs), gathering all primary and hospital care resources in given areas under the same administrative management.

All patients included in this study were residents in the Basque Country with SARS-CoV-2 infection that was laboratory-confirmed by a positive result on the reverse transcriptase-polymerase chain reaction assay for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) or a positive IgM or IgG antibody test performed due to symptoms suggestive of the disease or having had contact with a positive case, from February to September 25th, 2020. No patients were excluded. The study protocol was approved by the Ethics Committee of the Basque Country (reference PI2020059). All patient data were kept confidential.

All data on patients under the care of Osakidetza are recorded in a unified electronic database. Analysts retrieved data from all positive cases detected during the study period including sociodemographic data (age, sex, and place of residence), smoking habits, BMI, hospital admission in the previous month to the COVID-19 diagnosis, baseline comorbidities [all those considered in Charlson’s Comorbidity index [13] plus angina, arrhythmia, arterial hypertension, dyslipidemia, asthma, bronchiectasis, cystic fibrosis, interstitial lung disease, lymphoma, leukemia, coagulopathy, inflammatory bowel disease, gastrointestinal bleeding], flu vaccination, pneumococcal vaccine, baseline treatments [based on the Anatomical, Therapeutic, Chemical (ATC) classification system] [14], and other background data concerning care provided in hospital or primary care settings including dates of hospital admission. Comorbidities were identified based on the International Statistical Classification of Diseases and Related Health Problems (ICD) ICD-9 or 10 codes in patients’ records at baseline [15].

We grouped comorbidities as follows: cardiovascular diseases (including myocardial infarction, angina, arrhythmia, congestive heart failure, and peripheral vascular disease); cerebrovascular disease, hemiplegia and/or paraplegia; arterial hypertension; dyslipidemia; dementia; respiratory disease [chronic obstructive pulmonary disease (COPD), bronchiectasis, chronic bronchial infection]; asthma; liver disease (mild liver, moderate or severe liver disease); diabetes (diabetes with/without organ damage), kidney disease, cancer(malignant tumor, metastatic solid tumor, lymphoma); rheumatic disease; peptic ulcer; inflammatory bowel disease; and coagulopathies.

Regarding baseline medication, we selected drugs based on ATC codes [14]. Baseline treatment was defined as any drugs prescribed before the patient was diagnosed with SARS-CoV-2 infection and had no end date. A summary information about the grouping of baseline treatments included in this study was included elsewhere [16].

Data identifying people living in a nursing home were obtained from the Basque Health Department. The outcomes of the study were hospital admission during the study period.

Additionally, we studied the validity of our electronic database by comparing the data obtained from it for subset variables with the information provided by a group of trained reviewers who retrieved the same information, item by item, from the electronic health records of the same patients. The results are provided elsewhere [16].

Statistical analysis

The sample was randomly divided into two subsamples, for derivation and validation purposes of the prediction rule (60% and 40% of the entire sample, respectively).

Descriptive statistics included frequency tables for categorical variables and means and standard deviations (SDs) for continuous variables. Patient characteristics were compared between the two subsamples (derivation vs. validation) using Chi-square or Fisher’s exact tests for categorical variables, and Student’s t test or nonparametric Wilcoxon tests for continuous variables.

Univariate logistic regression models were first built using the derivation sample to identify the significance of each potential risk factor for predicting hospital admission. In these models, hospital admission was used as the dependent variable and all candidate predictive variables (described previously) as the independent variables. Then, independent variables with a p < 0.20 in the univariate analyses were considered potential independent variables in the multivariate analysis, for which multilevel analysis with generalized estimated equations was performed considering the IHO. Potential interactions between variables were also examined. In the final models, only factors with p < 0.05 were retained. Odds ratios (ORs) and 95% confidence intervals (CIs) were calculated. The predictive accuracy of the model was determined by calculating area under the ROC curve (AUC) for discrimination [17], and by comparing predicted and observed hospital admission using the Hosmer and Lemeshow test for calibration [18].

To develop the predictive risk score, we first assigned a weight to each risk factor in relation to each β parameter based on the multilevel model. Then, we summed the weights of each of the risk factors of a patient, higher scores indicating a greater likelihood of hospital admission. The predictive accuracy of the hospital admission risk score was assessed using the AUC [17] in both derivation and validation samples (external validation). Furthermore, we sought to validate the risk score by K-fold cross-validation [19, 20], which uses part of the available data of the derivation sample to fit the model, and a different part to test it (internal validation) [19].

Based on the hospital admission risk score, we categorized the score into different levels of risk. The optimal thresholds on the continuous risk scores were determined with the catpredi function of the R package CatPredi using the genetic algorithm [21]. The performance of the risk classification was studied by comparing the hospital admission rate between categories and using the multilevel analyses with generalized estimated equations and AUCs, in both derivation and validation samples.

All effects were considered significant at p < 0.05. All statistical analyses were performed using SAS for Windows, version 9.4 (SAS Institute, Carey, NC), and R© version 4.0.4.

Results

A total of 49,750 COVID-19 positive patients were included in the analysis. The whole sample was divided in a derivation and validation samples for the purpose of this study. The main characteristics of both samples are described in Table 1. The univariable analysis results for risk of hospital admission in the derivation sample are shown in Table 2. As it can be seen, multiple background characteristics, comorbidities, and basal patient treatments were related to hospital admission.

Table 1 Comparative analysis between the derivation and validation samples
Table 2 Univariate analysis for prediction of hospital admissions in derivation sample (N = 29,850)

The multivariable multilevel analysis for the prediction of the risk of admission in the derivation sample is shown in Table 3. Male showed higher risk of hospital admission. Regarding age, its gradual effect continues to be maintained with greater ease of admission up to 89 years of age, decreasing from 90 years of age or older. It should be noted that the age groups composed of those under 49 years of age did not exceed the OR of 8.2. Regarding comorbidities, the results observed in the univariable analysis are retained, highlighting coagulopathies, cancer, diabetes with organ damage, and liver disease among them. The fact of having been admitted in the previous month to the COVID-19 diagnosis implied a risk of hospital admission of an OR 4.23 (95% CI 3.24–5.52). On relation to flu vaccination in the previous season, it is also a risk factor of hospital admission with an OR of 1.18 (95% CI 1.08–1.29). However, the pneumococcal vaccine is no longer predictive of risk for admission. Related to the treatments, it should be noted that the drugs that increase the risk were the baseline prescription of chronic systemic steroids, immunosuppressants, angiotensin-converting enzyme (ACE) inhibitors, and NSAIDs. The AUC for this model was 0.821 (95% CI 0.815–0.828) (Fig. 1).

Table 3 Multivariable analysis for prediction of hospitalization admissions in the derivation sample using multilevel analysis
Fig. 1
figure 1

ROC curves for the risk score in both derivation and validations samples, and k-fold cross-validation

Based on the multivariable model, we assigned a weight to each category of each significant predictor and created a risk score, ranging from 0 to 97. The AUCs (95% CI) of the risk score were 0.821 (0.815–0.828) and 0.828 (0.821–0.835) in the derivation and validation sample, respectively. Similar AUC were obtained by K-fold cross-validation, 0.820 (95% CI 0.814–0.827) (Table 3).

From the previous risk score, we developed five risk groups of hospital admission where patients risk of admission ranged from 2.26 to 54.58% from the minor to the most severe risk category in the derivation sample (Table 4), with an AUC of 0.814 (95% CI 0.808–0.821). Those risk groups were validated in the validation sample, reaching an AUC of 0.820 (95% CI 0.812–0.827) with a risk of admission ranging from 2.09 to 54.31% from minor to severe categories (Table 4).

Table 4 Risk groups of hospitalization in the derivation and validation samples

We also present different scenarios for the specificity and sensibility of our risk score for different cut-off points of the previous risk groups (Table 5) with a sensitivity of 85.7 and specificity of 61.75 (85.53 and 62.35, respectively, in the validation sample) for a risk score ≥ 26 points.

Table 5 Sensitivity, specificity, and positive and negative predictive values according to different cut-off points in both derivation and validation samples

Additionally, other relevant variables as smoking habits and BMI have been considered, as well. Nevertheless, smoking habits was finally excluded from the previous final model, because it was only present in 39,193 records and was not considered reliable enough. The BMI variable presented a similar situation, since it was available just in 16,237 records though when included in our previous multivariable model was statistically significantly associated with hospital admission, with OR increasing from 1.19 (25 ≤ BMI < 30) to 1.25 (30 ≤ BMI < 35), 1.39 (35 ≤ BMI < 40), and 2.36 (35 ≤ BMI < 40) compared to BMI < 25.

Discussion

The present is a population-based study with a cohort of 49.750 consecutive patients in Basque Country with confirmed microbiological infection with SARS-CoV-2. In line with the other studies, we find that males have a higher risk of requiring hospitalization and suffer worse development of the disease [22]. As far as the age of the patients, this factor has a gradual impact concerning the risk of requiring hospitalization. Several studies [22] indicate how those over 50 have a higher risk of needing intensive care treatment or death, ranging from 60 [23] or even 68 in Italy [24]. Regarding comorbidities in our cohort, the coagulopathies are the pathologies that offer the greatest risk of hospitalization and the main factor in the final outcome [25]. Numerous studies present cardiovascular illness producing a greater risk of a bad prognosis [4], as well as HTN or diabetes [6], which also figure in or study as one of the top five factors.

Cancer patients have a higher risk of suffering complications due to COVID-19, although we have to consider that most Cancer patients are over 65, suffer one or more comorbidities, and their treatment frequently causes immunosuppression [26]. This bad prognosis in line with the results from different studies with cohorts in USA [27] and the UK [22], and one of the five comorbidities of greater risk when calculating hospitalization.

Regarding the BMI, the results obtained highlight the increased gradient with weight. In a study based on a cohort of 306 patients[28], the main finding concerned the high frequency of obese patients requiring Intensive Care treatment, with a marked difference with patients treated in previous years; 47.5% registered BMI of ≥ 30 kg/m2; 13.7% with an BMI 35–39.9 kg/m2 and 14.5% recording an BMI ≥ 40 kg/m2.

As far as the use of smoking habits, a meta-analysis conducted by Carlos A. Jiménez-Ruiz [29], was not able to prove that smokers had a higher chance of becoming infected but did show that a smoker or ex-smoker had twice as high a risk of undergoing a worsening condition or adverse outcome (OR 1.96, IC del 95%, 1.36–2.83). Equally smoking habits use whether a current or ex-smoker is shown to be a risk factor for those more critical, requiring orotracheal intubation, Intensive Care treatment or death (OR 1.79, IC del 95%, 1.19–2.70). However, the data obtained by us are not statistically significant nor reliable enough.

With respect to the flu vaccine in the Basque Country, it is targeted primarily at those aged 65 or over, patients with chronic illness or pregnant women and it is recommended to those who work in the Health Service. [30]. This distribution of the vaccine may help to explain the link between those who have been vaccinated and those under greater risk of hospital treatment for COVID-19. Being therefore a statistical significance association, but not a casual one, mainly due to the distribution of and degree of compliance with the vaccine in the older population and those most at risk in our environment.

Regarding treatment, higher risk has been observed in patients who use ACE inhibitors and NSAIDs. This could be due to a confounding factor with underlying medical comorbidities, increasing the risk and linked to higher rates of hospitalization and of death by COVID, which are treated more frequently with these medicines [5]. Nevertheless, based on the evidence available to date, there are no compendiums which suggest the need to modify the use of ACE inhibitors, and therefore, they should be introduced or maintained in accordance with the current guides, independently of SARS-CoV-2. This conclusion, however, should be constantly reviewed.

In so far as the limitations of the study, it should be pointed out that the population studied included only patients in the Basque Country diagnosed positive with COVID-19 in the general population. However, we hope that this model is more widely used in other geographic regions. Second, the data collected through our system and used for extraction and analysis provided a solid and rapid analysis of a large cohort. Nevertheless, due to the retrospective design, not all the variables to be considered were obtained from all the patients, or in a reliable way, as was the case with BMI or smoking habits use, and therefore, these variables cannot be evaluated appropriately. Moreover, in some of the diagnosed cases during the first wave, the health crisis meant that the information gathered was incomplete and lacking in the necessary detail for a more in-depth analysis of specific aspects. Among the strengths of the study, it is to be noted that the multicenter population design developed in a public health system with universal access for the entire population with a network of Primary Care, Hospital Care and Emergencies fully connected with both a clinical history and universal laboratory network. Therefore, the size and the wide geographic coverage offer a statistical power to confirm the hypothesis and represent strong points of the study, together with the potential to apply it at the first line of medical attention. In addition, from a practical point of view, the proposed model of prediction of hospitalization would allow the two lower risk categories with a high degree of negative prediction value (95.79%) to be sent home and maybe even category 3. Those patients in categories 3–4 could be dealt with bearing in mind other data and evaluate deeply those in category 5 who may require hospitalization. To date, the criteria used to admit patients to a hospital were the continuously updated by the WHO [31] and the guidelines from the Spanish Ministry of Health [32], which specifies the criteria to be followed for the evaluation of the severity of patients in the services of hospital emergencies for subsequent admission or home monitoring by primary care teams, with age, comorbidities, fever, and respiratory failure; the factors to be taken into account mainly. Nevertheless, in the first few weeks of the pandemic, patients admitted were those more severe which may condition our conclusion, though was limited in time.

As conclusion, we propose a risk scale of hospitalization in COVID-19 patients, which indirectly concerns to people with a higher risk of poor evolution, to facilitate clinical decision-making by both primary care professionals and emergency services. Our scale can be easily and quickly completed either in primary attention units or in hospital or out-of-hospital emergencies, which would help in clinical decision-making.