Introduction

COVID-19, a complicated multi-system syndrome caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), first emerged in December 2019 and was declared a global pandemic by the World Health Organization (WHO) in March 2020 [1, 2]. Despite isolation measures, the COVID-19 pandemic has spread globally with over 772 million confirmed cumulative cases, and more than 6.9 million cumulative deaths, as reported by the WHO on December 17, 2023. The constant emergence of SARS-CoV-2 variants and the ongoing outbreak of COVID-19 have already consumed a large number of medical resources, leading to a considerable global healthcare crisis [2, 3]. Thus, identifying the causal factors of COVID-19 is crucial for reducing its disease burden. In particular, a deeper awareness of the causality and magnitude of the effects of different clinical risk factors may help identify high-risk individuals and provide further direction on the mechanisms of COVID-19.

Blood is critical for human health, underpinning various physiological processes such as immune response, oxygen transport, and maintenance of homeostasis, which when impaired cause a considerable health burden [4, 5]. Recently, several observational studies have increasingly revealed the correlation between COVID-19 and the commonly used clinical indicators of blood testing [6,7,8]. Nevertheless, no causal inference can be made from these results due to unmeasured confounding and reverse causality in observational studies [9, 10]. Mendelian randomization (MR) can help fill this gap as alleles are randomly assigned and allelic randomization antedates the onset of disease [10]. As a useful alternative to randomized control trials, MR is widely used to infer the causal nature between the exposure and the outcome in recent years [9,10,11]. Importantly, bidirectional MR allows for the evaluation of reverse causality, determining whether changes in blood markers are a consequence of COVID-19, thereby providing a comprehensive causal framework.

In the current study, we first collected clinical data on 58 common blood indicators from both healthy individuals and COVID-19 patients, covering a range of indicators impacting human health. We then sought to identify these indicators affecting COVID-19. Subsequently, we obtained the summary-level statistical data for these candidate blood clinical factors from large-scale GWAS and sought to infer potential causal associations of these candidate indicators on COVID‐19 using large‐sample statistical data sets of three COVID‐19 subtypes (COVID-19 infection, hospitalized COVID‐19, and severe COVID‐19). Complementing this, bidirectional MR analyses were conducted to discern not only the effect of blood indicators on COVID-19 risk but also the potential influence of COVID-19 on the levels of these blood indicators, thus providing a thorough exploration of their interplay.

Methods

Study design

We retrospectively collected clinical data from both healthy individuals and COVID-19 patients at the First Affiliated Hospital of Guangzhou Medical University to identify blood clinical indicators affecting COVID-19. Summary-level data for 58 blood indicators identified by the observational study were obtained from the large-scale Genome-wide association studies. Two-sample Mendelian randomization (MR) analyses were applied to infer the causal nature of host genetic factors on COVID‐19 risk by exploiting single nucleotide polymorphisms (SNPs) as instrumental variables (IVs) of exposure [10, 12]. All MR analyses should meet the following three assumptions. First, the selected IVs were robustly associated with the exposure. Second, the used IVs should not be related to potential confounders. Third, the IVs only affect the outcome only through the exposure. MR analysis was excluded if pleiotropy was detected. Furthermore, bidirectional MR was incorporated to assess the potential reciprocal causal relationships between COVID-19 and the blood indicators, providing a comprehensive analysis of directionality in these associations. The workflow of the study design is presented in Fig. 1.

Fig. 1
figure 1

The workflow of designed analysis. MR, Mendelian randomization; COVID-19, coronavirus disease 2019; SNP, single nucleotide polymorphism

Human cohort information

To identify the candidate risk factors of COVID-19, we retrospectively collected clinical data on 58 common blood indicators by reviewing the patients' electronic health records from the first affiliated hospital of Guangzhou medical university from January 2022 to the present. These indicators were divided into five major groups: hematological traits, hepatic and renal function markers, myocardial injury markers, metabolic indexes, and inflammatory parameters. Patients aged 18–65 years with a diagnosis of COVID-19 by reverse transcription-polymerase chain reaction (PCR) were defined as the COVID-19 group. Healthy adults (aged 18–65 years) who tested negative for COVID-19 and attended a medical checkup were defined as the healthy control group. Participants who had missing information were eliminated from the study and the final analytic sample consisted of 1,325 healthy individuals and 901 COVID-19 patients who had provided informed consent to participate in this study. The summary of the demographic information of participants is listed in Table S1. Given the retrospective nature of our research, the need for ethical approval and the informed consent statement was waived by the Ethics Committee of the First Affiliated Hospital of Guangzhou Medical University (China).

Genome‐wide association study (GWAS) summary datasets preparation

All GWAS summary statistics were obtained from IEU Open GWAS database. The exposures were obtained from publicly available GWAS summary results (https://gwas.mrcieu.ac.uk/datasets/), including those on (COVID-19 infection, 38,984 cases, and 1,644,784 controls), hospitalized COVID‐19 (9,986 cases and 1,887,658 controls), and severe COVID‐19 (5,101 cases and 1,383,342 controls) [13]. Where feasible, independent samples were utilized to mitigate the risk of sample overlap between the COVID-19 dataset and the UK Biobank GWAS dataset. Despite these precautions, we acknowledge that complete exclusion of sample overlap cannot be guaranteed due to the shared nature of some data sources. The summary-level statistical data for white blood cell count (WBC), basophil cell count (BASO), lymphocyte cell count (LYM), eosinophil cell count (EOS), plateletcrit and hematocrit were downloaded from the Blood Cell Consortium [4, 14]. The summary-level statistical data for neutrophil count (NEU), red blood cell count (RBC), hemoglobin concentration (HGB), mean platelet volume (MPV), platelet distribution width (PDW), alkaline phosphatase (ALP), albumin, total bilirubin (Tbil), HDL cholesterol (HDLC), total cholesterol (TC), apolipoprotein A-I (ApoA-I), c-reactive protein (CRP), and hemoglobin A1c (HbA1c) were obtained from UK Biobank GWAS [15, 16]. The summary-level statistical mixed-population data for glomerular filtration rate (GFR) was obtained from a large GWAS [17]. The summary-level statistical mixed-population data for troponin I (Tnl), D-dimer (DD), and Serum amyloid A-1 protein (SAA) were obtained from the INTERVAL study [18]. The summary-level statistical data for the levels of myoglobin [19] were obtained from large GWASs.

Genetic instrument selection

The candidate IVs were selected by a series of quality control steps. Instrumental SNPs meeting quality control criteria were identified using R software. Initially, SNPs were selected as IVs if they met the genome-wide significance threshold (P < 5 × 10^-8). However, for exposures where fewer than three SNPs met this threshold, we extended the criteria to include SNPs with a P-value < 5 × 10^-6 to ensure a sufficient number of instruments for robust Mendelian randomization analysis. Considering that SNPs which directly affect the outcome variable could violate the assumptions of the instrumental variable, any IVs not included in the outcome GWAS and those significantly associated with the outcome (P > 5 × 10–5) were removed [20]. Furthermore, the number of above-selected IVs selected from the outcome is not less than three. Subsequently, phenoscanner2 was used to evaluate whether any exposure-related IVs were associated with confounders of COVID-19. Palindromic and incompatible IVs were then removed by harmonization to ensure that the effect of these IVs on exposure corresponded to the same allele as the effect on the outcome. F statistic was applied to test whether there was a weak instrumental variable bias. The F-statistics were calculated by the formula of F = R2/ (1 − R2) *(n − k − 1)/ k (R2 = 2*MAF*(1-MAF) *Beta2; n, sample size; k, number of instrumental variables; and MAF, minor allele frequency). Moreover, the statistical power of MR analysis to detect causal association was calculated by mRnd [21].

Statistical analyses

The Chi-square test was applied to detect differences in categorical variables, which were reported by percentage (%). Continuous variables were compared using t-tests or non-parametric Wilcoxon rank sum tests after testing the normality of the distribution using the Shapiro–Wilk test. Mean ± standard deviation (SD) was applied to describe the normally distributed continuous variables, while the median and interquartile range (IQR) was expressed for these variables that did not meet the normality assumptions. To maintain the integrity of our statistical analysis and avoid the bias associated with listwise deletion, we implemented a mean imputation strategy for missing values. This method was selected based on its appropriateness for our data structure and the minimal impact it has on the distribution of observed data. The “TwoSampleMR” package based on R (Version 4.0.2) was used to conduct MR analysis. The conventional inverse-variance weighted (IVW) was deemed the most reliable model because it provides the most persuasive estimates when there is no evidence of directional pleiotropy [22,23,24]. Moreover, MR-Egger and weighted-median methods were implemented as sensitivity analysis approaches to ensure the robustness of the results [23, 24]. The MR-Egger test for directional pleiotropy and Cochran’s Q statistics were applied to identify whether significant heterogeneity or directional pleiotropy was present.

Results

Distinct blood indicators profiles in COVID patients compared to healthy controls

In our retrospectively analysis, we evaluated 1,325 healthy individuals and 852 COVID-19 patients from the first affiliated hospital of Guangzhou medical university (Table S1). The COVID-19 cohort was significantly older than the controls, with a balanced gender distribution (Table 1). We assessed 58 blood biomarkers, commonly used in clinical settings, for potential associations with COVID-19 (Fig. 2A, Table S1). Following FDR correction, we found significant disparities between the groups across a range of biomarkers, including counts of various blood cells (white, basophil, lymphocyte, eosinophil, neutrophil), red blood cell parameters (count, hemoglobin concentration, hematocrit), platelet metrics (mean volume, plateletcrit, distribution width), coagulation profiles (prothrombin time, activated partial thromboplastin time, thrombin time, fibrinogen, D-dimer), metabolic and organ function indicators (uric acid, urea nitrogen, alkaline phosphatase, albumin, bilirubins, bile acids, myocardial enzymes), lipid profiles (HDL cholesterol, total cholesterol, apolipoprotein A), glycemic control marker (hemoglobin A1c), and inflammation markers (C-reactive protein, serum amyloid protein A) (Table 2, Fig. 2B, C). These data revealed a broad spectrum of altered blood biomarkers in COVID-19 patients compared to healthy controls.

Table 1 Baseline characteristics of participants
Fig. 2
figure 2

Difference of the clinical blood indicators between COVID-19 patients and healthy individuals. A Summary of the commonly used clinical indicators of blood testing. B The geographical locations of the first affiliated hospital of Guangzhou medical university. C Log 2 the fold changes of clinical indicators between COVID-19 patients and healthy individuals. Bars in red represent up-regulated traits in COVID-19 patients compared to healthy individuals while bars in blue indicate down-regulated traits in COVID-19 patients compared to healthy individuals. The depth of color refers to the Log 10 expression of the p-value

Table 2 Risk factors for COVID-19 patients from the first affiliated hospital of Guangzhou medical university

Genetic instrumentation and GWAS data for COVID-19 risk factors

To assess the causal nature of potential risk factors on COVID‐19, we sourced the summary-level statistics for 25 candidate markers from extensive GWASs. Post LD pruning, we identified 3 to 510 genetic instruments for blood indicators, all demonstrating strong genetic instruments (F-statistics > 10, Table 3). We targeted three COVID-19 subtypes—infection, hospitalization, and severity—as outcomes, reflecting the spectrum of SARS-CoV-2 infection impacts (Table 4). Detailed information on COVID-19 independent SNPs (after the clumping process) for candidate markers were listed in Tables S2S6.

Table 3 Characters of 28 candidate risk factors in European ancestry
Table 4 Description of COVID-19 subtypes in European ancestry

The MR analyses revealed the causal roles of blood indicators on COVID-19

Our MR analyses explored the causal imapct of 25 candidate factors on COVID‐19 outcomes (Tables S79). Notably, genetic liability to higher basophil cell counts (BASO) was linked to a reduced risk of hospitalization due to COVID-19, with an odds ratio (OR) of 0.85 for each standard deviation (SD) increase in BASO (95% CI: 0.73–0.99, Figs. 34, Table S8). Elevated Troponin I (Tnl) levels were associated with an increased risk of severe COVID-19, while higher BASO, hemoglobin concentration (HGB), and hematocrit (HCT) levels were associated with reduced risk (Figs. 34). Specifically, ORs for severe COVID-19 were 1.21 for Tnl, 0.74 for BASO, 0.76 for HGB, and 0.83 for HCT per SD increase (95% CIs: 1.02–1.43, 0.60–0.93, 0.61–0.94, and 0.70–0.97, respectively; Table S9). Our MR analysis confirmed four blood biomarkers affecting COVID-19, aligning with our observational data for all of them (Fig. 5).

Fig. 3
figure 3

Overview of the associations of 28 candidate markers with three subtypes of COVID-19. IWV indicates an Inverse variance‐weighted method

Fig. 4
figure 4

The causal association of candidate blood indicators on COVID-19. Forest plot for causal effects of blood indicators on three subtypes of COVID-19 (COVID-19 infection, hospitalization, and severity).The red diamonds indicated higher odds of COVID-19 (P < 0.05), the blue diamonds indicated lower odds of COVID-19 (P < 0.05), and the grey diamonds indicated the odds of COVID-19 (P > 0.05)

Fig. 5
figure 5

Summary of the causal association between COVID-19 and blood indicators

The MR revealed the causal roles of COVID-19 on blood indicators

To obtain a deeper understanding of the potential causal mechanisms between COVID-19 and blood factors, we employed bidirectional MR analyses to test if there is a causal effect of disease on these biomarkers (Tables S1012). Genetic liability to higher creatine kinase B-type (CKB) and total bilirubin (Tbil) were linked to a incresed risk of COVID-19 infection, while higher platelet distribution width (PDW) and albumin were associated with reduced risk. Specifically, ORs for severe COVID-19 infection were 1.30 for CKB, 1.12 for Tbil, 0.93 for PDW, and 0.80 for albumin per SD increase (95% CIs: 1.01–1.67, 1.01–1.23, 0.87–0.99, and 0.66–0.97, respectively; Table S10). Notably, genetic liability to higher albumin was linked to a reduced risk of severe COVID-19, with an odds ratio (OR) of 0.96 for each SD increase in albumin (95% CI: 0.93–1.00, Table S12). Our reverse MR analysis confirmed four blood biomarkers influenced by COVID-19, aligning with our observational data for two of them (Fig. 5).

Notably, COVID-19 exhibited a positive causal relationship with Troponin I (Tnl) and Serum Amyloid Protein A, while a negative association was observed with Plateletcrit.

Discussion

According to our knowledge, our research is the first comprehensive inference of the causal nature between the common clinical blood indicators and COVID-19. We found significant variations in 36 blood indicators when comparing COVID-19 patients to healthy controls. Subsequent analysis of 25 candidate indicators from GWAS data, using MR-analytic methods, established causal links for 4 blood markers with COVID-19. Notably, all of these markers (Troponin I, hematocrit, hemoglobin concentration, and basophil cell count) corroborate our observational findings. Additionally, reverse MR analysis affirmed the influence of COVID-19 on four blood biomarkers, with two (albumin and total bilirubin) reflecting our observational data trends. These findings might provide novel insight into the pathophysiology of COVID-19 and may aid in the development of new diagnostic and therapeutic strategies for COVID-19.

Our study has confirmed Troponin I (TnI) as heritable causal risk factors for COVID-19. The linkage of myocardial injury markers like TnI with higher mortality rates in COVID-19 patients is well-documented, with myocardial troponin recognized as a critical prognostic tool for assessing COVID-19 severity and mortality, supported by prior research and clinical bulletins from leading cardiology associations [6, 25,26,27,28]. Moreover, COVID-19 appears to elevate levels of total bilirubin (Tbil), indicating potential pathways for disease impact beyond respiratory symptoms, including liver involvement. Studies have shown liver abnormalities in COVID-19 patients, with elevated Tbil being a significant indicator [29, 30]. It is noteworthy that elevated levels of Tbil may predict adverse outcomes in COVID-19 patients [31, 32], as it reflects the severity of liver damage, which is associated with an increased mortality rate among these patients [33]. Beyond its correlation with the progression and prognosis of COVID-19, TBIL has also been observed to relate to the incidence of ARDS and acute myocardial injury in affected patients [34], with higher TBIL levels corresponding to increased hs-cTnI levels, suggesting that elevated TBIL may be indicative of cardiac injury [35].

In our investigation, we identified basophil cell count, hematocrit, and hemoglobin concentration (HGB) as heritable causal protective factors of COVID-19. Additionally, our findings indicate that COVID-19 may lead to a decrease in albumin levels. Leukocytes are crucial for immune balance and defense against SARS-CoV-2, as evidenced by lower counts in COVID-19 patients compared to controls, which is consistent with multiple reports highlighting their prognostic value [36,37,38,39,40]. Our MR analysis specifically underscores the protective role of basophil cell, suggesting their monitoring could be pivotal in managing COVID-19. Moreover, we observed that anemia, marked by reduced hematocrit and HGB levels, may contribute to the susceptibility to COVID-19—a finding echoed by recent studies indicating anemia as a prevalent condition in COVID-19 patients [39, 41, 42]. Anemia can be a manifestation of malnutrition, which, in turn, may lead to decreased levels of albumin [43, 44]. HGB has been regarded as a more sensitive indicator of anemia than hematocrit. Nevertheless, recent studies show conflicting results on the association between HGB and COVID-19 [39]. Some papers observed similar HGB in deceased patients and those who survived COVID-19 [45], or in ICU patients and those who had been not admitted to ICU [46], whereas others reported lower HGB levels in COVID-19 patients [39, 47]. These findings suggest that our study results may inform the development of targeted therapeutic interventions that target these blood indicators to reduce the risk of COVID-19.

Our study's strength lies in integrating observational data with Mendelian randomization (MR) analyses, which mutually reinforce the robustness of our findings. The observational study provided a broad scope, while MR analyses reduced confounding and reverse causation, although the possibility of false negatives remains. The utilization of electronic health records facilitated the retrospective collection of extensive data, enhancing our analysis with a substantial sample size. Additionally, employing summary-level GWAS data strengthened the causality inference through genetic instruments. Nevertheless, there are also several limitations to consider when interpreting our results. The study's reliance on European and Chinese data may not be representative of other ethnicities, potentially limiting the broader applicability of our findings. Furthermore, We also did not apply a range of MR methods, such as MR-PRESSO, which may have left certain pleiotropic effects unaddressed, thus impacting our conclusions. Furthermore, the potential for sample overlap in the large-scale genomic data used, including between COVID-19 datasets and the UK Biobank, is acknowledged as a challenge that could introduce analytical bias. The issue of pleiotropy, with single genetic variants influencing multiple traits, also remains a concern for the validity of our results.

While our findings shed light on the potential links between blood indicators and COVID-19, we recognize the limitation of not conducting a multivariable Mendelian randomization analysis to assess each indicator's independent effect. Acknowledging this, we suggest that future studies incorporate multivariable approaches to more definitively determine the relationships observed. Moreover, in cases where genetic instruments could not be identified or were not sufficiently reliable, we chose not to proceed with MR analysis to avoid potential biases and misinterpretations. Additionally, examination of the biological pathways linking blood indicators to COVID-19 risk would provide a deeper understanding of the mechanisms underlying these associations. This could involve conducting functional genomics studies to identify the specific genes and molecular pathways that are affected by these blood indicators, as well as conducting animal and cell-based studies to further explore the causal relationships identified in this study.Comprehensive approaches, including the consideration of sample overlap, will be crucial to further validate our findings and enhance the reliability of potential interventions derived from these insights.