Introduction

During the COVID-19 pandemic, population-wide person-level electronic health record (EHR) data has increasingly gained importance for exploring, modeling, and reporting disease trends to inform healthcare and public health policy [1]. The increasing availability of COVID-19 digital health data has fostered the interest in the use of real-world data (RWD) [2], defined as patient data collected from their EHRs, which can be analyzed to generate real-world evidence (RWE) [3]. Actually, RWE can provide a better image of the actual clinical environments in which medical interventions are carried out when compared to conventional randomized clinical trials (RCTs), given that RWD includes detailed data on patient demographics, comorbidities, adherence, and simultaneous prescriptions [4, 5]. Moreover, RWE studies are not only cheaper than RCTs but can also be accomplished much faster, an advantage in scenarios in which an urgent decision must be taken, as in the case of a pandemic. In particular, discovering new drugs that could be used as efficient COVID-19 therapies is still an urgent need. Interestingly, much information on drugs, prescribed in COVID-19 patients for other indications, that could affect the progression of the disease is currently available in EHRs. For example, RWE has recently demonstrated that vitamin D has a significant protective effect on COVID-19 hospitalized patients [6]. Therefore, RWD opens the door to carry out massive drug repurposing studies as well as research on potential adverse effects or interactions of drugs with COVID-19 progression.

Since 2001, the Andalusian Public Health System has systematically stored all the electronic health record (EHR) data of Andalusian patients in the Health Population Base (BPS) [7], which is currently one of the largest repositories of clinical data in the world (with over 13 million of comprehensive patient registries) [7]. Because of its size and the detail of the data stored, BPS constitutes a unique and privileged environment to carry out large-scale RWE studies.

Results

Data analysis

Clinical data for a total of 15,968 COVID-19 patients hospitalized in Andalusia between January and November 2020 were requested from the BPS. The data was transferred from BPS to the Infrastructure for secure real-world data analysis (iRWD) [8] at the Foundation Progress and Health of the Andalusian Public Health System.

The endpoint considered was COVID-19 death during the first 30 days of hospital stay (see Methods). To elucidate if any given treatment could potentially reduce the mortality in COVID-19 inpatients a covariate balance analysis, which considers confounders (covariates that present an a priori possibility of confounding the association between a treatment and the survival outcome: sex, obesity, hypertension, cancer, pulmonary diseases, hypertension, asthma, age, and mental diseases; see Methods and Table 1), was carried out to determinate the viability of further covariate-adjusted analysis. For these drugs eligible for covariate-adjusted analysis, survival was estimated using a weighted Cox Proportional Hazard model (See Methods), conditioned to the confounders of interest (Table 1). A total of 864 treatments were identified in the BPS drug archive among the patients analyzed.

Table 1 Association between each covariate and the end point using chi-squared tests, along with the test p-value, counts and proportions with respect to the end point

Since clinical data on laboratory analyses is also available in the BPS, lymphocyte progression, high levels of which account for a favorable progression, was assessed along with the drug treatment by a Linear Mixed Effects analysis, weighting the model with the same schema as in the survival analysis (see Methods for details).

Drugs with significant effect on patient survival

Survival estimations showed that a total of 21 drugs have a significant effect on patient survival and, simultaneously, showed a significant increase in lymphocyte counts, after correction for the possible confounding covariables and for multiple testing (see Fig. 1; Table 2). Figure 2 shows the pattern of lymphocyte counts along the infection in the period studied for Enoxaparin (Fig. 2A), which displays a clear trend of high levels of lymphocyte progression, for calcifediol (Fig. 2B), with protective effect already reported [6], supported also by high levels of lymphocyte progression, and, as a counterexample, furosemide, here linked to an increase in death risk, with lymphocyte levels below the average population (Fig. 2C). Table S2 contains an exhaustive list of the results obtained for the drugs tested.

Fig. 1
figure 1

Impact of drugs on patient survival. Adjusted log-hazard ratios 95% confidence intervals for all the eligible treatments that were significant in both analyses (survival and lymphocyte count progression) before and after FDR adjustment

Table 2 Log Hazard ratios obtained for the drugs tested, along with standard deviations (SDs), FDR-adjusted p-values, and Lymphocyte proliferation values (see Methods) along with FDR-adjusted p-values. The two last columns indicate the drugs used in the machine learning drug repurposing prediction study and the significance of the prediction
Fig. 2
figure 2

Lymphocyte counts. Plots showing the evolution of lymphocyte counts along the time studied (15 days since hospital admission) for (A) enoxaparin, (B) Calcifediol and (C) Furosemide

Validation of previous machine learning predictions

Interestingly, a number of the drugs found to affect COVID-19 patient survival were predicted as potentially active against COVID-19 [9] using machine learning and mathematical modeling [10] of the recently proposed COVID-19 the disease map [11] (see the last two columns from Table S2). It is interesting to note that, among the drugs eligible for the covariate-adjusted analysis (those in Table S2) there is a significant enrichment of drugs predicted as repurposable by the machine learning model among those with a significant protective effect with respect to the covariate-adjusted survival test (X2 = 4.003, pvalue = 0.0454), which supports the validity of the predictions previously made [9].

Discussion

The drugs associated to the highest survival, bemiparin (DB09258), logarithm of Hazard ratio (LHR)= -1.62, with a 95%, confidence interval (CI) of [-1.95,-1.31], and a False Discovery Rate (FDR) adjusted pvalue = < 10− 11 and Enoxaparin (LHR= -1.17, 95% CI [-1.36,-0.98], FDR p-value = < 10− 11), are antithrombotic used, as other heparins, to prevent thrombotic and thromboembolic complications in hospitalized patients. While for bemiparin only weak evidence of its protective effect has been found in the literature [12], a lower rate mortality in COVID-19 patients was described for enoxaparin when compared to other heparins [13], in agreement with the results found here. However, this protective effect is not shared by other anticoagulants, such as tinziparin (LHR= -0.34, 95% CI[-1.38, 0.69], FDR p-value = 1), despite its use in pulmonary embolism, or Fondaparinux (LHR=-0.33, 95% CI[-1.64, 0.97], FDR p-value = 1). Calcifediol and Cholecalciferol, already described by us in a previous work [6], are significantly associated with better patient survival, probably due to the protective role of vitamin D due to its pro-immune and antiinflammatory properties. Other studies suggest also a protective effect of ascorbic acid (vitamin C) [14]. Table S2 contains an exhaustive list of the results obtained from the drugs tested.

One of drugs with a significant protective effect is simvastatin, a widely used statin, a group of drugs that reduce the blood level of low-density lipoprotein (LDL) cholesterol. Statins are also known for their pleiotropic effect, exerting an anti-inflammatory and antithrombotic action by inhibiting the NF-Kβ pathway which directly reduces inflammatory cytokines (IL1, IL6, TNF-α), CRP, and neutrophils [15]. Furthermore, a retrospective study performed in COVID-19 hospitalized patients showed that statins inhibit RAS activation and reduce angiotensin II proinflammatory effects, therefore improving endothelial function and remodeling after vascular injury [16]. A recent in-vitro study demonstrates that simvastatin pretreatment in human Calu-3 epithelial lung cells inhibited SARS-CoV-2 binding and entry to the cell by inducing a redistribution of ACE2 receptors, lowering its concentration on the plasma membrane [17]. Recent retrospective studies also point to the relationship between statin consumption and a reduced risk of mortality in COVID-19 patients [16, 18]. Another predicted drug is hydrochlorothiazide, a diuretic drug, often combined with ACE-inhibitors such as enalapril as antihypertensive therapy [19]. It has been reported that patients with hypertension present a higher susceptibility to a severe COVID-19 prognosis [20], underlying hypertension as a risk factor for increased mortality in infected patients. Although the effect of antihypertensive drugs on COVID-19 patients with hypertension is controversial, the upregulation of ACE2 by ACE-inhibitors was linked to a dampened hyperinflammation and increased intrinsic antiviral responses of the cell in hypertensive COVID-19 patients [21]. The results presented here, together with these previous reports, suggest that ACE-inhibitors may have a protective effect, in addition to helping to improve the prognosis of hypertensive patients. Dexamethasone has been studied in the context of COVID-19 disease due to its property as an anti-inflammatory drug [22]. Although ibuprofen and other analgesic like acetaminophen was initially discommended for COVID-19 treatment [23], further studies based on observational data could not confirm the theoretical risks of ibuprofen and other Nonsteroidal Anti-Inflammatory Drugs (NSAIDs) in SARS-CoV-2 infection [24]. Moreover, other studies suggested that some NSAID could have antiviral activity in coronaviruses, including SARS-CoV-2 [25], an activity demonstrated here for ibuprofen. Similarly, tranmadol is an opioid analgesic used to treat moderate to severe pain, that was initially deemed as bad prognosis [26], but further studies suggested a potential therapeutic effect [27].

The empagliflozin is an inhibitor of the sodium-glucose cotransporter 2 (SGLT2) used in the treatment of type 2 diabetes, whose potential utility in patients with COVID-19 has been suggested [28] but not demonstrated yet. Also for diabetes patients, the available evidence suggests that Sitagliptin may be beneficial in treating COVID-19, particularly in patients with type 2 diabetes who appear to be at high risk of mortality and of cardiorenal or cerebrovascular complications [29]. Another diabetes treatment, metformin, has also been suggested as an effective in the treatment of COVID-19 [30].

It has been suggested that steroids used for asthma treatment could have a protective effect in COVID-19 [31], although specifically beclometasone dipropionate was not assessed. It has also been reported that corticosteroids, including prednisone, are effective in reducing mortality in COVID-19 patients within their therapeutic window [27], or reduce hospitalization times, like budesinode [32]. Some studies suggest that formoterol could be used to improve lung function and assist symptom control in COVID-19 patients [33] however, the available evidence does not suggest any significant interaction between formoterol and COVID-19 [34]. A recent study suggested that olmesartan could alleviate renal fibrosis induced by SARS-CoV-2 envelope protein by regulating HMGB1 release and autophagic degradation of TGF-β1 [35]. In the case of omeprazole, a proton pump inhibitor used to treat gastroesophageal reflux disease (GERD), peptic ulcer disease, and other acid-related disorders, several studies have indicated an anti-viral effect [36], as well as a therapeutic role in combination with other antiviral [36]. Finally, zithromycin is an antibiotic with potential antiviral and anti-inflammatory properties [37] although the consensus is that there is no evidence to support the use of azithromycin for the treatment of COVID-19 [38].

On the other hand, a study suggested that furosemide, a diuretic medication used to treat fluid build-up due to heart failure, liver scarring, or kidney disease and high blood pressure, may have potential therapeutic benefits for COVID-19 patients with acute respiratory distress syndrome [36], contrarily to what we observed here, supported by the lymphocyte count data (see Table 2 and Supplementary Table S2). It is important to note that other drugs, which are marginally non-significant because of small sample sizes, have also a potential negative effect on COVID-19 patient survival. These drugs have different mechanisms of action and are used to treat different conditions: latanoprost, used to treat glaucoma and ocular hypertension, ciprofloxacin, an antibiotic, tamsulosin, an alpha-blocker, trazodone and lormetazepam, used to treat insomnia and anxiety, and lormetazepam, a benzodiazepine.

To our knowledge, previous studies either did not identify evidence suggesting any effect on COVID-19 prognosis, or they not have detected the significant protective effects we observed in this study for certain drugs such as diazepam, gliclazide, hydrochlorothiazide, calcium, aspartic acid, codeine, ramipiril, flitucasone furoate, flitucasone furoate, zithromycin and enalapril. The large sample size of this study and the appropriate management of confounding variables allowed us to validate some proposed therapeutic interventions and to expand the number of potential COVID-19 treatments.

Conclusions

The Andalusian Population Health Database was used to explore drug repurposing using data from 15,968 COVID-19 patients hospitalized in Andalusia between January and November 2020. The study identified 21 drugs associated with improved patient survival and lymphocyte progression. This finding offers potential treatment options for COVID-19. However, one drug, furosemide, was linked to increased patient mortality, requiring further investigation. This study demonstrates the value of drug repurposing strategies in addressing emergent health challenges. Additionally, it underscores the importance of comprehensive clinical databases in advancing medical knowledge and patient care.

Materials and methods

Design and patient selection

This study uses a retrospective cohort which includes Andalusian patients with COVID-19 diagnosis, hospitalized during the period January 2020 to November 2021.

The Ethics Committee for the Coordination of Biomedical Research in Andalusia approved the study (29th September, 2020, Acta 09/20) and waived informed consent for the secondary use of clinical data for research purposes.

Data management

Clinical data corresponding to COVID-19 patients hospitalized in Andalusia between January and November 2020 was requested to the Health Population Base (BPS), and from there transferred to the Infrastructure for secure real-world data analysis (iRWD) at the Foundation Progress and Health (FPS) of the Andalusian Public Health System for further analysis. In particular, the data listed in Table S1 was extracted in BPS from the electronical health records for each patient and transferred to FPS for a total of 15,968 COVID-19 patients that fulfilled the inclusion criteria.

Data preprocessing

Medication data in the office and hospital pharmacy records were found for 864 treatments. Individuals are considered as treated with a specific drug if prescriptions and the corresponding pharmacy dispensations (thereinafter a valid pharmacy order) were found within a period from 15 days before the hospital admission until the discharge up to 14 days (or death). Otherwise they were considered untreated.

The endpoint studied was COVID-19 death (certified death events during hospitalization). As in previous studies, the first 30 days of hospital stay were considered for survival calculations [39]. The time variable in the models corresponds to the length (in days) of hospital stay. The stays that imply one or more changes of hospital units are combined in a single stay where the admission and discharge dates are set to either the start of the first or the end of the last combined stay. Only the first stay for each patient was considered to reduce potential biases due to reinfection.

Covariate definition

Following previous studies [40] the ICD codes were grouped into conditions as diabetes mellitus (ICD code E11), diseases of the circulatory system (ICD10 codes I00-I99), diseases of the respiratory system (ICD10 codes J00-J99), neoplasms (ICD10 codes C00-D49), dementia (ICD10 codes F00-F03), anxiety or mood disorders (ICD10 codes F30-F48), and other mental diseases (ICD10 codes F04-F29 and F50-F99). Obesity and other associated conditions (ICD10 codes D5-D8) with a possible confounding effect with the COVID-19 outcome were checked but no evidence was found in our database (nonsignificant χ2 association test). The age was categorized in the following ranks: [18, 40], [41, 67] and [68, 99). Gender was also considered as a known covariate. Table 1 displays the association between each covariate and the end point considered here, death, using chi-squared tests, along with the test p-value. Counts and proportions with respect to the end point are also provided.

Statistical analysis

To elucidate if any given treatment could potentially reduce the mortality in COVID-19 inpatients three statistical tests were conducted, considering covariates that present an a priori possibility of confounding the association between a treatment and the survival outcome [41] (see previous section).

Firstly, the survival outcome was estimated using a Cox Proportional Hazard model weighted using the Inverse Probability of Treatment Weighting (IPTW) technique, with the weights computed using a logistic regression model and adjusted for estimating the Average Treatment effect on the Treated population (ATT) conditioned to the confounders of interest using the whole cohort. ATT is the most used weighting approximation to estimate treatment effects [42]. To obtain an accurate measure of the variability of the marginal hazard ratios the closed-form estimator previously proposed [43] was used.

Then, the lymphocyte progression, a marker of COVID-19 severity [44], is established as the different measurements of lymphocyte counts with respect to the initial day of hospitalization up to 14 days [45]. Dates outside the hospitalization date range were omitted. The association between the positive linear trends in daily lymphocyte counts and reduced mortality in COVID-19 is obtained by comparing the trends in a treated population versus a control untreated population. A Linear Mixed Effects (LME) analysis was conducted to estimate if there was an increasing linear trend in the log-transformed lymphocyte progression due to being under a given treatment and the statistical significance was checked using an ANOVA analysis of the model [46]. The model was weighted following the same weighting schema as in the survival analysis. In addition, a covariate balance analysis to determine the viability of the weighting schema [47] was carried out.

For each treatment, the Inverse Probability Weighting (IPW) was used, based on propensity scores (IPW) generated using the WeightIt R package (v 0.12) [48]. Here, the exposed condition is either having valid pharmacy order for the treatment during the 15 days prior to the beginning of the hospitalization event or during the first 14 days of the hospitalization. To assess the viability of the IPW analysis the proportion of covariates that could be effectively balanced was checked using the standardized mean differences test as implemented in the Cobalt R package (v 4.3.1) [49], using the 0.05 threshold [47]. A treatment is eligible if all the covariates could be properly balanced, resulting in 122 eligible treatments out of the 864 initially found.

In both cases, p-values are corrected for multiple testing with False Discovery Rate (FDR) [50]. Significance is achieved at level 0.05 and 95% confidence intervals are provided.

Software used

Weights for IPW are computed with the WeightIt R package (v 0.12) [48]. IPW covariate suitability was computed using the Cobalt R package (v 4.3.1) [49]. The survival estimation was conducted with R package HrIPW (v 0.1.2) [51]. The LME analysis was conducted with R package lme4 (v 1.1–27) [52]. The ANOVA analysis of the LME model was conducted with R package lmerTest (v 3.1-3) [46]. R version 3.6.3 (2020-02-29).