Introduction

Since the beginning of SARS-CoV-2, knowing the way of virus entrance in host cells is a challenge. Much emphasis has been placed in Transmembrane Serine Protease 2 (TMPRSS2) involved in the priming of viral S protein, mediating the cleavage of this viral spike [1]. Matsutama et cols demonstrated that TMPRSS2 enhances SARS-CoV-2 infection by using cells expressing this receptor [2].

It is known that one of the main strategies to battle the virus is to avoid virus fusion in cell receptors, so we have focused on genes related to target receptor-binding domain of spike protein. One of the main genes is transmembrane serine protease 2 gene (TMPRSS2); which cleave the spike (S) protein. Moreover, there are studies suggesting that TMPRSS2 rs75603675 genotype combined with dyslipidemia and gender, are main predictors of disease severity [3]. This gene acts jointly with ACE2, and both have been included as the two major host factors contributing to SARS-CoV-2 virulence and pathogenesis [4]. There are a growing number of studies showing the abnormal predominance of pro-inflammatory ACE/Ang II/AT1R/Nox over anti-inflammatory ACE2/Angiotensin/MasR pathways as the probable cause of chaotic inflammatory responses in COVID-19 [4]. The human myxovirus resistance genes (MX) encode GTPases that are part of the antiviral response induced by type I/III IFNs. MX1 is proved with a wide antiviral activity against RNA and DNA viruses, and it partially overlaps with TMPRSS2 transcript [5]. Moreover, it has been proved that five single nucleotide polymorphisms (SNPs) within TMPRSS2 and near MX1 gene show associations with severe COVID-19 [6]. Furthermore, recent publications have related methylation and expression altered levels of MX1 in the critical group as indicators of the role of SARS-CoV-2 in reducing the expression levels of this antiviral gene and thus promoting virus replication and disease progression [7].

Furthermore, drugs designing for SARS-CoV -2 was even proposed and approved by FDA target TMPRSS2 and angiotensin converting enzyme 2 (ACE2), suggested as novel antiviral drugs [8]. These both enzymes (TMPRSS2 and ACE2) are considered as main points of cell-mediated viral entry, and several alkaloids are also being evaluated as drug inhibitors for these enzymes, and as novel drug strategies in COVID-19 [9].

Moreover, in prostate cancer cells it has been described an effect of androgens on AR (androgen receptor) increasing TMPRSS2 expression, suggesting the correlation of increased lung infections (and severe COVID-19) in men [10, 11]. Furthermore, it has been identified a co-expression of ACE2, TMPRSS2, and AR in human alveolar and bronchial epithelial cells, indicating that AR drives ACE2 and TMPRSS2 expression in the lung [12]. Likewise, recent data have suggested AR as anti-infective target for anti-Omicron lineages, mainly due to the specificity of AR against spike RBD (receptor binding protein) of SARS-CoV -2 [13].

TMPRSS2 is also strongly related with other genes involved in prostate cancer, like ERG, ETV5 and AR, whose characterization could help to clearly classifying COVID-19 [7, 11]. AR is also reported as having a role in SARS-CoV-2 mediating fibrotic and immune responses in the lung, like collagen deposition and cytokine levels. TMPRSS2 is a well-known transcriptional target of AR in prostate cancer cells and has recently shown to be regulated by androgens and anti-androgens also in lung cells and mouse lung tissue [12]. The role of ERG together with TMPRSS2 in non-prostatic tissues remains to be elucidated, but it has been shown that this gene may be involved in inflammatory responses [14].

As can be seen, TMPRSS2 is a gene that plays a pivotal role in COVID-19 development. TMPRSS2 is demonstrated as an essential host cell factor in the context of SARS-CoV-2 infection and constitutes an attractive target for antiviral intervention [15]. However, the other genes mentioned above (apart from ACE 2), they still do not have strong evidence in COVID-19 disease, while we have demonstrated a robust link among all of them. Our main aim is to try a molecular characterization in these main related genes (TMPRSS2, AR, ACE2 and MX1) to better understand COVID-19 severity; and trying to initiate the lines for molecular biomarkers in COVID-19 management.

Results

Clinical analysis

We have performed an analysis comparing clinical data in present cohort (asymptomatic vs severe disease). We found that age has significant values (p < 0.001), severe patients ranged around 60.1 years old contrasting to mild ones around 45.3. Moreover, clinical parameters such as increased ferritin (p < 0.001), D-dimer (p < 0.010), C-Reactive protein (CRP) (p < 0.001) and lactate deshidrogenase (LDH) (p < 0.001) present significant values. All of them, except D-dimer, present the maximum proportion of the population with higher values in severe groups of patients. But D-dimer has a 60.7% of mild disease patients located with the highest values (> 4.77 ng/mL); more details of clinical data are described in Additional file 1: Table S1. Details in Table 1.

Table 1 Characteristics of the study population

Genetic analysis

We assessed TaqMan® genotyping analysis in three main selected SNPs of genes ACE2 (rs2285666), MX1 (rs469390) and TMPRSS2 (rs2070788) for present analysis. As it will be described in detail in “Genotyping” section Genotyping; the selection was performed according to their relevance in NCBI (COVID-19 issue) and their allelic frequency (over 10% in minor allele in Caucasian population). We have conducted “Genetic analysis” section in two phases; (1) How genetic markers could stratify COVID-19 aggressiveness (classifying the disease in asymptomatic/mild vs Severe/critical; and (2) How clinical parameters (ferritin, D-dimer, CRP, troponin, lactate dehydrogenase and IL-6) correlate to genetic markers. Firstly, we have conducted how genetic markers could help to stratify our cohort (mild vs severe), but none of them had significant values. We found that G carriers in TMPRSS2 (rs2070788) have more risk of developing serious/critical disease, especially in women (Table 2).

Table 2 Logistic regression analysis of risk of COVID-19 disease

Secondly, we have focused on the analysis of these SNPs in relation to main clinical parameters such as ferritin, D-dimer, CRP, troponin, lactate dehydrogenase and interleukin-6 (IL-6), but no significant values were shown. See Additional file 1: Tables S2 and S3 for more details. A meta-analysis was also performed but not relevant data were obtained, see details in Additional file 1: Table S9.

Allele combination analysis

SNPs association studies could be a better predictive approach rather than investigating individual polymorphisms. It estimates more specific risk, reduces the dimension of association test, and increase statistical power.

The most common allele combination for ACE2, MX1 and TMPRSS2 in present study was TAG with frequencies of 24.59% in mild patients and 24.74% in severe patients (Additional file 1: Table S4). None of the combinations were significantly associated with a reduced risk of severe COVID-19, however we observed that TAA (OR 0.25, 95% CI 0.06–1.09; p = 0.066) could work as a protective role. SNPs combination CAA (OR 6.27; 95% CI 1.00–39.22; p = 0.051), was significantly associated with an increased risk of developing severe COVID-19.

Gene expression analysis

Differential expression analysis was performed as previously described. The value of genetic expression of each patient was calculated as the average ± SD (standard deviation) of three different replicates. A Tukey’s range test was performed to detect anomalous values. As can be seen in Additional file 1: Table S5, when comparing aggressiveness, AR and MX1 had statistically significant differences (p = 0.002 and p = 0.036 respectively), showing higher expression in asymptomatic/mild patients (Fig. 1).

Fig. 1
figure 1

Genes expression analysis comparing aggressiveness in COVID-19 disease. (*) p < 0.05

The association between sociodemographic factors (age and gender) with gene expression levels of COVID-19 aggressiveness is shown in Table 3. Younger age and higher expression levels in ERG increased the risk of severe COVID-19 (p = 0.042). In the case of AR higher expression levels are related with a decreased risk of severe COVID-19 disease (p = 0.025) in female cases.

Table 3 Binary logistic regression model for the aggressiveness in COVID-19 disease assessed by age and gender

In silico analysis

Most of the variant changes are in an intron or upstream/downstream genes, and they do not produce any amino acid change. See details in Additional file 1: Table S6. As expected, data from DAVID Bioinformatics Resources confirmed that present studied genes have a strong association with coronavirus disease, according to their clinical implication. ACE2, MX1 and TMPRSS2 were found in the disease development pathway (p = 8.1 × 10–4). ACE2 and TMPRSS2 are involved in early stages of infection, while MX1 implication takes place in later stages. Moreover, two of the genes of interest (ACE2 and TMPRSS2) are supposed to be very closely related and involved in the same molecular process of membrane fusion (p = 4.4 × 10–3), where both are acting as proteases (p = 0.047). Also, it is obtained that most of the studied genes (TMPRSS2, ERG, ETV5 and AR) are associated with pathways involved in the development of prostate cancer. It is also possible to infer a relationship of these genes with the transduction of signals that induce membrane fusion (p = 0.011), and with a positive regulation of transcription from RNA polymerase II (p = 0.034).

Interestingly, GTEx data showed a significantly higher expression of TMPRSS2 in lung tissue with GG genotype in rs2070788 (p = 8.9 × 10–9), and in MX1 gene (rs469390) in AA (p = 9 × 10–8). Although we did not observe significant differences in our samples, our expression analysis were done in blood samples and showed a similar shift for rs469390, see details in Additional file 1: Fig. S1.

In silico analysis using different miRNAs target prediction tools showed numerous miRNAs potentially targeting ACE2, MX1 and/or TMPRSS2 genes. Among all miRNAs predicted, only a few were related to the respiratory system. Finally, eleven miRNAs were highlighted as master regulators of studied genes (Additional file 1: Fig. S2).

Moreover, STRING analysis also reported a close relationship of TMPRSS2 with the remaining studied genes. It shares an implication in prostate cancer with ETV5, AR and ERG; while ACE2 shares its importance in the development of COVID-19. On the other hand, the study of MX1 is also important due to its physical proximity to TMPRSS2 within chromosome 21, although they are not connected in these pathways (Additional file 1: Fig. S3).

Discussion

Characterization of molecules involved in the infection process for classifying COVID-19 severity remains a challenge in present clinical practice. A deeper understanding of mechanisms for SARS-CoV-2 infection involves investigating the host proteins used by this virus, such as TMPRSS2 [16,17,18].

Here we focus on TMPRSS2, and related genes ACE2, MX1, AR, ETV5 and ERG, by its own or combined with clinical data, as an easy and relevant classifier of COVID-19 aggressiveness. Although there are many publications concerning this aim, there are controversial data about this goal.

A multi-omic approach developed by JS Maras et al., demonstrated that increased basal level of MX1 is correlated with SARS-CoV-2 infection [19]. This event could aid in the identification of patients predisposed to high severity. Moreover, many efforts are focused on the role of ACE2 and TMPRSS2 as key markers for COVID-19 severity. It has been demonstrated that human endothelial cells express the main cofactors needed for SARS-CoV-2 internalization, including ACE2, TMPRSS2, and CD-147 [20]; so directly or indirectly they will be involved in the disease.

This is the first time that MX1 gene (rs469390) is related to COVID-19 and proved its utility as expression biomarker between asymptomatic and severe patients. MX1 is reported as a gene product of interferon, which will play important roles in inflammation in the lower respiratory tract, which will be relevant for developing severe COVID-19 cases. Moreover, MX1 is included as a calculator of “inflammation index” highly expressed in COVID-19 patients with a high diagnostic yield [21] and suggested as a good respiratory biomarker due to its interaction with TMPRSS2. However, here according to severity or clinical parameters, we could not find differences in rs469390 of MX1 gene, although by its interaction with TMPRSS2 was suggested to be a good respiratory biomarker [22].

According to ACE2, it is recognized as the main receptor for SARS-CoV-2 and it is a requirement for COVID-19 virus entry combined with TMPRSS2. High expressions patterns of ACE2 and Dipeptidyl Peptidase 4 (DPP4) can be detected in blood and alveolar lavage fluid in patients with chronic obstructive pulmonary disease and asthma [21, 23]. We selected its main SNP (rs2285666), but no significant values were reported. Although it was previously described in a Caucasian population associated with an increased risk of being hospitalized and a severity course of the disease with recessive models of inheritance [22], the same was reported in Indian populations [24]. This SNP is in intron 3, and it was also reported that intronic regions play relevant regulation in ACE2. Specifically, it has been suggested that a reduced expression of ACE2 may lead to an imbalance of the renin-angiotensin system in patients with COVID-19, which may represent a major pathological outcome of viral infection [25]. A study developed in Iranian population indicated that rs2285666 GG genotype or G allele by its own is associated with the incidence of COVID-19 [26].

In relation to TMPRSS2, we focused on rs2070788 which was suggested with interest on virus infections, in combination with other SNPs in present gene. Similarly, as described by K. Schonfelder et al. in a German cohort, we could not find a relation between this SNP and infection risk or severity in COVID-19 [1]. Moreover, it has also been reported that rs2070788 with GG genotype had the highest expression in lung compared to other genotypes [22]. Accordingly, here we found that G carriers in TMPRSS2 (rs2070788) have more risk of developing serious/critical disease, especially in women. Furthermore, GTEx data showed a higher expression of GG genotype in lung tissue, in contrast to our expression analyses that showed lower expression level of this gene in G carriers patients. This might be caused by variations in expression between the different tissue types. This association between expression and disease severity could be due to an increased easiness of this protein to be found by the virus in the cellular membrane, resulting in a higher infection success [5, 6]. It was also found a relationship between rs469390 in MX1 and a higher expression level in lung of A carriers in GTEx database. These data are in accordance with our results and previous observations of AA genotype association and a higher susceptibility to COVID-19.

TMPRSS2 is a protein belonging to the serine protease family, which functions rely on gene fusion with ETS transcription factors, such as ERG and ETV1. The TMPRSS2: ERG gene fusion is the most frequent genomic alteration in several tumour cases and results in overexpression of the transcription factor ERG [20]. Other authors suggested that COVID-19 severity is higher in men, and this is due to an important role of androgens in SARS-CoV-2 infection mechanism [7, 10, 11]. That is why we have also included ERG and AR expression analysis in present study. We found that higher expression levels in ERG increased the risk of severe COVID-19 which is described for the first time with this role. It has been described as a potential drug target for treatment of COVID-19, but nothing is reported as a severity biomarker [27].

Moreover, we also found that AR expression is altered between asymptomatic/mild and severe/critical patients. This discovery is in the same line of reported data in LNCaP cancer cells, those treated with AR antagonists of prostate cancer (apalutamide, darolutamide, and enzalutamide) have an inhibition of SARS-

CoV-2 infection [7]. Furthermore, disparities in gender infection of COVID-19 are suggested due to higher levels of ACE2 and TMPRSS2 in males, as well as hormonal influences on the immune response [7].

Moreover, when developing VEP and DAVID analysis, it reinforces the strong association with coronavirus disease, according to their clinical implication; these three genes and their variants. STRING analysis corroborated these results, emphasizing the role of ACE2 and TMPRSS2 as interesting biomarkers of COVID-19.

Furthermore, we also found that ACE2, MX1 and TMPRSS2 TAA could work as a protective role contrasting with CAA significantly associated with an increased risk of developing severe COVID-19. This is the first time that a combination of these SNPs is performed to associate with COVID-19 risk.

Recent studies are focusing on the search of miRNAs that target main genes related to COVID-19 aggressiveness like ACE2 and TMPRSS2. These analyses suggested that hsa-miR-32-5p and hsa-miR-1246 levels were altered in critical versus asymptomatic individuals [28] and hsa-miR-200 could also affect ACE2/TMPRSS2 expression (29). Here we have reported by functional analysis that hsa-miR-98-5p, hsa-miR-202-3p, hsa-miR-4458 and hsa-miR-4500 as the main relevant ones among others from hsa-let-7 family. Just one of them, hsa-miR-98-5p, has also been reported in targeting SARS-CoV2 gene (ORF1ab) [30].

We described that several clinical parameters such as ferritin, D-dimer, LDH or CRP are good markers between mild and severe patients. Previous reports have also indicated high values of these biomarkers in combination with Absolute Neutrophil Count, Neutrophil to Lymphocyte Ratio (NLR) and Platelet to Lymphocyte ratio (PLR) as biomarkers associated with disease severity [31]. Moreover, we also found that age is a significant variable, age over 55 is associated with an increased risk of COVID-19 severe cases. Moreover, data published by H.Ashktorab et al.[32] confirmed our results, indicating that when analyses were adjusted for disease severity, significant variables were age over 65 years old, male sex, as well as having shortness of breath, elevated CRP and D-dimer. Although there are controversial data about the protection of influenza vaccine to COVID-19, here we did not find any representative protection role. Similarly, it was reported in studies of co-administration of influenza and COVID-19 vaccines with no reports in humoral response [33].

Conclusion

To sum up, the inclusion of these three markers ACE2 (rs2285666), TMPRSS2 (rs2070788) and MX1 genes (rs469390) in COVID-19 opens new strategies in the classification of these patients. CAA allele combination analysis of these SNPs was significantly associated with an increased risk of developing severe COVID-19. Similarly, G allele in TMPRSS2 (rs2070788) in female cases was associated with an increased risk of developing severe COVID-19. We do a lot of emphasis on the use of MX1, AR and ERG biomarkers in gene expression analysis, due to most representative differences have been proved in these data. Moreover, a better understanding of molecular mechanisms of SARS-CoV-2 infection, could be used for an effective managing infection and inclusion of diagnostic and therapeutic biomarkers in clinical practice. However, we would like to include that one of the limitations of present results should be included and are related to limited sample size, but very well clinical and genetically classified. A deeper study increasing sample size will improve present data.

Materials and methods

Patients

Peripheral blood tubes collected in EDTA and Tempus™ RNA tubes were obtained from each patient. Samples were processed in the subsequent 4–6 h after collection, following a protocol depending on the subsequent analysis. Samples were frozen at -80ºC until future analysis.

All collected samples were confirmed in diagnosis of COVID-19 by RT-PCR and positive IgG serology. All of them follow-up inclusion criteria based on WHO classification. Inclusion criteria were revised periodically to update database trying to have balanced samples according to age, gender and severity. Those in mild disease were characterized by fever, malaise, cough, upper respiratory symptoms, and/or less common features of COVID-19 (headache, loss of taste or smell etc.). Moreover, patients in severe disease group fulfill the following features: (i) hypoxia: SPO2 ≤ 93% on atmospheric air or PaO2:FiO2 < 300 mmHg (SF ratio < 315); tachypnea: in respiratory distress or RR (respiratory rate) > 30 breaths/minutes; or more than 50% involvement seen on chest imaging [34].

A total of 329 samples (n = 186 mild and n = 143 severe) with a mean age of 55 (aged 33–80) years old were recruited from 2020 to 2022, all clinical data (ferritine, D dimer, CRP, troponin, and LDH); symptoms (fever, anosmia, asthenia, dyspnoea, long COVID, etc.); and intensive care unit (ICU) clinical follow-up (need of assisted ventilation, pneumonia, etc.), were included in the report, details described in Table 1.

The study protocol was approved by the Ethics Committee (CEI) with internal code 1329-N-21. Informed written consent from all participants was obtained in accordance with the tenets of the Declaration of Helsinki.

DNA extraction

Genomic DNA extraction was obtained from whole blood of all samples, by using kit RealPure “SSS” (Durviz, Spain). DNA Quantification was performed by fluorescence using Qubit™3.0 (Invitrogen™ by Thermo Scientific, USA) and nanodrop 2000 system (Thermo Scientific, USA), this equipment was also used to check the 260/280 ratio as quality control. Extracted DNA was stored at − 20 °C until genotyping.

RNA extraction

Total RNA from 258 plasma samples (n = 160 mild and n = 98 severe) of Tempus™ Blood RNA tubes were extracted using Tempus™ Spin RNA Isolation Kit protocol. Quality was validated by A260/A280 in NanoDrop™ 2000c. This analysis was performed in a sub-selection of samples to prove the role of present genes in expression patterns, just those samples with all clinical data were included for the analysis.

Genotyping

SNPs selection was performed according to NCBI website in the most relevant data according COVID-19 and TMPRSS2 published papers until 2022. Moreover, only those SNPs with an allelic frequency over 10% in minor allele (MAF) in Caucasian population were selected according to Ensembl database. We finally selected ACE2 (rs2285666), MX1 (rs469390) and TMPRSS2 (rs2070788) for present analysis, see details of probes in Additional file 1: Tables S7 and S8.

DNA genotyping was performed using TaqMan® Genotyping Master Mix (Applied Biosystems, USA). Allelic discrimination assays were carried out in a 7900HT Fast Real-Time PCR System (Applied Biosystems, USA).

Reverse transcription PCR and quantitative real‑time PCR (qPCR)

RNA reverse transcription was implemented using PrimeScript RT Reagent Kit (Takara Bio, JP). Quantitative polymerase chain reaction (qPCR) was performed with SYBR Green designed probes (Life Technologies, CA), for genes: ERG, ETV5, AR, MX1; and TaqMan™ gene expression assays (Thermofisher, USA) for: ACE2 (Assay ID: Hs01085333_m1) and TMPRSS2 (Assay ID: Hs01122322_m1) on a 96-wells plate with QuantStudio 6 Flex Real-Time PCR System (Applied Biosystems, USA). These genes were selected for having a relevant interaction with main genes of present article (MX1, ACE2 and TMPRSS2). Primers were designed using Primer-Blast (NIH) software under the following conditions: span an exon-exon junction, PCR product size between 60–150 nucleotides and primer melting temperature within the range of 59–61 °C. Details of probes are in Additional file 1: Table S9. qPCR reactions were performed as follows: 95 °C during 10 min for enzyme activation; followed by 45 cycles of 15 s at 95 °C and 1 min at 60 °C for denaturing and annealing/extension.ineas All samples were run in triplicates, with a NTC (non-template control) in each plate. Threshold cycles (Ct) ≥ 35 were considered undetermined values. mRNAs expression levels were quantified using the comparative threshold cycle method (2 − ΔΔCt) relative to HPRT1 (hypoxanthine phosphoribosyltransferase 1) expression as an endogenous control. Relative quantification parameter (RQ or 2 − ΔΔCt) was estimated for each case and used in statistical analysis. In order to differentiate between high and low expression levels, we took into account the median value of RQ for each gene. Values below the median value are in low expression; and values above the median value are in high expression for ERG, ETV5, AR, MX1, ACE2 and TMPRSS2, see details in Additional file 1: Table S5.

In silico analysis

An analysis of the different variants of present study was carried out in "The variant effect predictor" (https://www.ensembl.org/info/docs/tools/vep/index.html) [35]. This software was used to calculate changes in transcripts and malignancy of variants. We also used ClinVar tool (https://www.ncbi.nlm.nih.gov/clinvar/) for data validation.

A functional analysis of TMPRSS2 and associated genes (ACE2, ERG, AR, ETV5 and MX1) was performed. This analysis was achieved using IPA (Ingenuity Pathway Analysis) [29] and DAVID Bioinformatics Resources v6.8 (https://david.ncifcrf.gov/) to obtain the role of gene pool, clinical implication, ontology and involved metabolic pathways. Moreover, STRING search tool was used to represent a protein–protein interaction (PPI) network including our target genes (https://string-db.org/) [36]. GTEx was used to analyse the differential gene expression according to the presence of studied SNPs in ACE2, MX1 and TMPRSS2 in lung tissue. The data used for the analyses described in this manuscript were obtained from the GTEx Portal (https://gtexportal.org/home/faq#citePortal).

Finally, miRDB database (http://mirdb.org/) was used to predict the miRNAs and their targeted genes, and the network image was obtained using the miRNet online tool (https://www.mirnet.ca) [37]. This analysis was performed using the default parameters, and Homo sapiens was selected as the specific taxonomy.

Statistical analysis

Hardy–Weinberg equilibrium (HWE) was performed using the online SNPstats and Metagenyo tools [38]. SPSS v.22 software package (IBM Corporation, USA) was used for statistical analyses. The association between COVID-19 severity, as well as clinical outputs and SNPs were analysed by chi-square test (χ2) or Fisher exact test for a small sample size. Binary logistic regression analyses using different genetic models (codominant, dominant and recessive), were used to assess which of the genetic factor might be determinant in aggressiveness.

To analyse clinical variables (ferritin, D-dimer, CRP, troponin, lactate dehydrogenase and IL-6), we divided each one into two groups of data in order to their values (low or high) and compared with single SNPs. To determine better contribution of the SNPs to COVID-19 aggressiveness, associations between SNPs were generated using the online SNPstats software.

Gene expression analysis was performed using non-parametric test (U-Mann Whitney) for all variables. Shapiro-Wilks test revealed that our results did not follow a gauss distribution, so we used U- Mann Whitney. Statistical significance level was p < 0.05.