INTRODUCTION

The SARS-CoV-2 coronavirus, first identified in 2019, led to a pandemic of a severe acute respiratory viral infection known as COVID-19 (from COronaVIrus Disease 2019). According to official statistics, on June 21, 2020, the pandemic covered 215 countries (territories), more than 8.8 million people were infected with COVID-19, and more than 465 000 people died [1].

SARS-CoV-2 is an RNA beta-coronavirus (CoV). For humans, seven coronaviruses are known: alpha-coronaviruses HCoV-229E and HCoV-NL63 and beta-coronaviruses HCoV-OC43, HCoV-HKU1, SARS-CoV, MERS-CoV, and SARS-CoV-2 [2]. Coronaviruses differ in pathogenicity: 229E, OC43, NL63, and HKU1 cause mild forms of infectious disease, while SARS-CoV, MERS-CoV, and SARS-CoV-2 are highly pathogenic (SARS-CoV caused atypical pneumonia in 2002–2003; MERS-CoV caused Middle East respiratory syndrome in 2015) [2, 3]. The clinical picture of diseases caused by highly pathogenic coronaviruses has both general and specific features [46]. In particular, COVID-19 is characterized by greater contagiousness, but lower mortality, and significant heterogeneity of clinical symptoms, including in different age groups and in different countries of the world compared with the infectious disease caused by MERS-CoV [5, 79]. The most common symptoms of COVID-19 are cough, fever, and dyspnea; myalgia, diarrhea, nausea, and vomiting are also recorded [6, 8]. However, individual clinical symptoms vary widely: from an asymptomatic course and a mild cold to complications such as bronchitis, pneumonia, severe acute respiratory distress syndrome, and multiple organ failure [6].

The human susceptibility to coronavirus infection and the clinical picture of the disease depend on many causes, among which a certain role belongs to the genetic characteristics of both the viruses and the person [1013]. The initial stage of infection of the body is the contact of the coronavirus with cells susceptible to it. In humans, about 100 genes whose products are involved in the process of virus entry into the cell are known; they are classified into five categories of Gene Ontology: viral entry into the host cell (GO: 0046718); virus receptor activity (GO: 0001618); fusion of the virus membrane with the host plasma membrane (GO: 0019064); clathrin-dependent endocytosis of the virus by the host cell (GO: 0075512); receptor-mediated endocytosis of the virus by the host cell (GO: 0019065) [14, 15]. As we study the interactions between viruses and host cells, the number of known genes that affect susceptibility to coronavirus infection will increase. At the same time, coronaviruses also differ in the structure of the genome and key molecules responsible for their entry into cells [11, 16, 17]. Therefore, not all human genes whose products are involved in the regulation of the entry of viruses into cells turn out to be significant in relation to the formation of the risk of developing the infection caused by various coronaviruses.

The present review analyzes the features of the SARS-CoV-2 genome and the structural and functional properties of human genes and proteins responsible for the susceptibility to this pathogen.

PATHOGENETICALLY SIGNIFICANT FEATURES OF THE SARS-COV-2 GENOME

The SARS-CoV-2 genome is structurally similar to the genomes of two bat coronaviruses bat-SL-CoVZC45 and bat-SL-CoVZXC21 (the identity is 88–96%) and has much less similarity to the genomes of SARS-CoV (79–80%) and MERS-CoV (about 50%) [17, 18]. The genomes of individual SARS-CoV-2 samples obtained from Chinese patients were characterized by the high similarity exceeding 99% [18], but estimates on genetic diversity are being refined as the number of studies on the structure of the SARS-CoV-2019 genomes from different regions of the world increases [19, 20].

The SARS-CoV-2 genome is characterized by lower values of the effective number of codons compared with SARS-CoV and MERS-CoV, which suggests a higher expression efficiency of genes, including genes encoding structural proteins: spike (S), envelope (E), membrane (M), nucleocapsid (N) [2, 21]. The S protein (spike protein), which is responsible for the attachment of SARS-CoV-2 to the membrane receptor, is considered critical for the entry into the host cell [22]. The SARS-CoV-2 coronavirus (like HCoV-NL63, SARS-CoV, MERS-CoV) uses membrane-bound angiotensin-converting enzyme 2 (ACE2) as a receptor [17, 2325]. Accordingly, SARS-CoV-2 can penetrate into all cells expressing ACE2. However, coronaviruses can also use a number of other molecules as receptors to enter the cells of the host organism [2527].

The structure of the SARS-CoV-2 S protein, as compared with other coronaviruses, has some features that are important for the entry into the host cell [16, 17, 28]. The identity between S proteins of SARS-CoV and SARS-CoV-2 is 72–75% [16, 17]. P. Zhou et al. [17] attributed three short inserts in the N-terminal domain and also changes in four of five key residues in the receptor-binding motif to the number of main differences in the structure of the gene sequence encoding the SARS-CoV-2 S protein. As compared with SARS-CoV, SARS-CoV-2 is characterized by a higher affinity for binding to the ACE2 receptor due to substitutions in the C-terminal domain of the S protein (there are more interactions with ACE2, H-bonds, etc.) [16, 28, 29], a better capability of fusion with the plasma membrane of the host cell due to amino acid substitutions in the HR1 domain of the S2 subunit of the S protein [30], and, in general, a higher activity [23].

As a result of comparative analysis of the genomes of SARS-CoV-2 isolates from different regions of the world, multiple mutations, including in genes encoding structural proteins (including the S protein), were established [20]. For example, a mutation in the receptor-binding domain of the S protein at position 407, leading to a change in the secondary structure of the protein, since the positively charged amino acid arginine is replaced by the hydrophobic C-beta-branched amino acid isoleucine, was found in clinical isolates from India [19]. Amino acid substitutions in the S protein were found also in SARS-CoV-2 from other geographic regions [31]. Mutations that change the structure of key proteins responsible for the binding of the coronavirus to its receptor can affect the pathogenicity of the coronavirus.

After the S protein binding to the ACE2 receptor, the protein is cleaved into S1 and S2 subunits, which initiates fusion of the viral envelope with the cell membrane and the entry of HCoV into the cell [22]. For these purposes, coronaviruses (like other viruses) use various proteases of host cells, the choice of which is determined by the structural features and the availability of the site for proteolytic cleavage of the S protein [32]. For example, it was shown that, as compared with laboratory strains from 1966, modern clinical isolates of the HCoV-229E alpha-coronavirus use TMPRSS2, not cathepsin L for the entry, which is associated with two amino acid substitutions (R642M and N714K) in the S protein [33]. SARS-CoV-2 also uses TMPRSS2 for activation and cleavage of the S protein [3436]. In addition, substitutions in the S protein gene leading to the formation of a furin-like cleavage site, which is absent in other SARS-like coronaviruses, were identified in SAR-CoV-2 [25, 37]. The presence of the furin cleavage site in SARS-CoV-2 determines the ability of virus–cell and cell–cell fusion, which ensures the high pathogenicity of SARS-CoV-2 compared with other beta-coronaviruses [38, 39].

Since the entry of coronaviruses into the cell occurs with the participation of protein products of the cells of the host organism, it can be expected that the features in the structure of genes encoding these proteins will determine individual differences in the susceptibility to coronaviruses, including SARS-CoV-2.

GENETIC FACTORS OF THE SUSCEPTIBILITY TO SARS-CoV-2 IN HUMANS

On the basis of a classical twin study, the heritability rates for some symptoms of COVID-19 were calculated: for fever, this parameter was 41 (95% confidence interval 12–70)%, for anosmia was 47 (27–67)%, and for delirium was 49 (24–75)%, which allowed us to estimate the predicted COVID-19 heritability rate at 50 (29–70)% [40]. These estimates indicate a significant genetic component in the formation of symptoms of this disease. The studies devoted to the identification of genetic markers of the susceptibility to SARS-CoV-2 are not numerous [12, 41, 42].

A number of genes whose products can participate in the process of human infection with coronaviruses, including SARS-CoV-2, are known (Table 1). These include the gene of ATP-dependent RNA helicase DDX1 (DDX1), which promotes replication of coronaviruses (as was shown for SARS-CoV) [56], as well as the genes IFITM1, IFITM2, and IFITM3 encoding interferon-induced transmembrane proteins [12, 57, 58] in addition to receptor genes (ACE2, ANPEP, DPP4) and protease genes (TMPRSS2, FURIN, TMPRSS11D, CTSL, CTSB), which contribute to the entry of the coronavirus into the cell.

Table 1. Characteristics of candidate genes for the susceptibility to human SARS-CoV-2

The role of ACE2 as a SARS-CoV-2 receptor is considered proven [16, 23, 25, 28, 39, 60]. The data on the possibility of using SARS-CoV-2 other receptors known for coronaviruses are ambiguous. A number of researchers [27, 48] attributed dipeptidyl peptidase 4 (DPP4, also known as CD26) to potential SARS-CoV-2 receptors, acting as a receptor for MERS-CoV, on the basis of modeling the structural interactions between the S protein and receptors of other coronaviruses [49]. Moreover, SARS-CoV-2 and MERS-CoV are characterized by the presence of identical DPP4 regions that are critical for the binding to the S protein [47]. At the same time, P. Zhou et al. [17] in an in vitro study found that SARS-CoV-2 does not use DPP4 as a receptor. The same authors did not confirm the significance of aminopeptidase N (ANPEP, also known as APN and CD13) as a SARS-CoV-2 receptor. Interestingly, ANPEP is a receptor for HCoV-229E, but does not have such properties for HCoV-OC43 [26]. However, ANPEP and DPP4 are characterized by expression patterns similar to those of ACE2 in 13 tissues, which allowed F. Qi et al. [61] to consider them as potential receptors for SARS-CoV-2. Further studies are needed to resolve the controversy regarding the receptor properties of DPP4 and clarify the possible receptor role of ANPEP for SARS-CoV-2.

After SARS-CoV-2 attaches to the ACE2 receptor, proteases, including TMPRSS2 and FURIN, are required to ensure the entry of the coronavirus into the cell [29, 37, 39]. D. Bestle et al. [51] found that, for SARS-CoV-2 to enter the cells, its S protein must be cleaved at two different sites by host cell proteases: by TMPRSS2 at the S2 site and by FURIN at the S1/S2 site.

The TMPRSS2 transmembrane protease and related proteases (such as TMPRSS11D) promote the entry of the coronavirus into the cell by two mechanisms: by cleaving the ACE2 receptor (which can stimulate the coronavirus uptake) and by cleaving the S protein of the coronavirus (which leads to the protein activation and fusion of the viral envelope with the membrane of the host organism cell) [35, 39]. The study by S. Bertram et al. [52] showed that TMPRSS2 activates cathepsin-independent entry of HCoV-229E into the host cell, and TMPRSS2 activation protects 229E-S-dependent entry into the cell from inhibition by interferon-induced transmembrane proteins. In model animals, it was found that the deficiency of this protease in the cells of the respiratory tract reduces the severity of SARS-CoV and MERS-CoV, which indicates the importance of TMPRSS2 in the infection process [62].

The entry of SARS-CoV-2 into the cell is also activated by lysosomal cathepsins [29]. Unlike TMPRSS2, cathepsins promote the entry of coronaviruses into the cell through endocytosis (lysosomal pathway); the choice of a protease for the entry is determined by the structural features of the S protein of the viruses [33]. The TMPRSS2 protease and lysosomal cathepsins have a cumulative effect with furin on the activation of the SARS-CoV-2 entry into the cells of the host organism [29].

Interferon-induced transmembrane proteins (IFITM1, IFITM2, IFITM3) are able to restrict replication of coronaviruses and their entry into cells and inactivate new coronaviruses when they are released from infected cells (shown for SARS-CoV and other viruses such as influenza A virus, Dengue virus, and West Nile virus) [5759, 63]. Since IFITM proteins are induced by type I and II interferons, they are believed to be crucial for the antiviral action of interferon [63]. In the study by Y. Zhang et al. [12], it was found that the structural features of the IFITM3 gene are associated with the severity of the course of COVID-19.

Thus, most of the genes included in Table 1 are categorized as those having proven or potential significance in terms of the involvement of their protein products in infection of the cells with SARS-CoV-2. Five genes (TMPRSS11D, ANPEP, DDX1, IFITM1, IFITM2) can be considered as likely significant, since the products encoded by them are involved in the entry processes of other coronaviruses (SARS-CoV, HCoV-229E) and a number of other viruses into the cells of the host organism [35, 52, 5658] (see Table 1). Further in the paper, all these genes will be designated as candidate genes for the susceptibility to SARS-CoV-2.

Among the genes involved in the consideration, only ACE2 is localized on the X chromosome, which may act as a possible cause of gender differences in the susceptibility to SARS-CoV-2 [6466]; other genes are localized on eight different chromosomes (Table 1).

The candidate genes for the susceptibility to SARS-CoV-2 are expressed in many organs/tissues, including those sensitive to this coronavirus (according to clinical observations [79]). However, there are differences in the level of their expression in different organs (Fig. 1). On the basis of the expression level of the candidate genes for the susceptibility to SARS-CoV-2, tissues are grouped into several clusters (Fig. 1). One cluster combines tissues of the lungs, aorta, coronary artery, thyroid gland, and visceral and subcutaneous adipose tissues; the second cluster unites tissues of other organs involved in the comparison; the greatest differences in the expression level were observed in blood cells.

Fig. 1.
figure 1

The heat map reflecting the expression level of the candidate genes for the susceptibility to SARS-CoV-2 in various human organs/tissues (constructed from median values of the number of transcripts per million (TPM) according to [45]). The expression level is normalized according to the genes for which zero corresponds to the average level of gene expression. The values in the wells reflect the number of statistically significant eQTLs affecting the level of gene expression in the respective tissues.

No differences in the expression level of the genes ACE2, TMPRSS2, CTSB, and CTSL between men and women were found [67]. Protein coexpression is an important condition for the entry into the cells and the spread of viruses in the human body [68], while organs and tissues where these genes are expressed can potentially be susceptible to coronavirus infection [69].

Individual differences in the susceptibility to SARS-CoV-2 may be affected by variants in the candidate genes that alter protein structure or expression. In addition, gene expression may depend on the methylation level of their promoter regions.

Genetic Variants Affecting Protein Structure

Nonsynonymous substitutions are recorded in all candidate genes for the susceptibility to SARS-CoV-2, but in most cases they are extremely rare, with low polymorphism, or are not recorded in all ethnoterritorial groups (Table 1; [44]). This may be associated with the functional significance of proteins and enzymes encoded by these genes, as well as their localization in the cell: all the considered proteins, except DDX1, are localized in the cell membrane [14]. At the same time, a large number of rare variants which are important for the entry of SARS-CoV-2 into the cells of the host organism and the development of infection were registered in the ACE2 gene, including S19P, I21T/V, E23K, A25T, K26R, T27A, E35D/K, E37K, Y50F, N51D/S, M62V, N64K, K68E, F72V, E75G, M82I, T92I, Q102P, G220S H239Q, G326E, E329G, G352V, D355N, H378R, Q388L, P389H, E467K, H505R, R514G/*, and Y515C; interracial differences in the frequency of variants were recorded for some of them (in bold) [67]. Since the ACE2 gene is localized on the X chromosome, the presence of such variants in men in the hemizygous state may act as an unfavorable factor for SARS-CoV-2 infection. In the candidate genes for the susceptibility to SARS-CoV-2, a large number of variants leading to amino acid substitutions or loss of protein function were identified with the detection rate of less than 1% (see Table 1). It is possible that just such rare variants both in the ACE2 gene [67] and in other considered genes may act as genetic factors that determine individual susceptibility to SARS-CoV-2.

The highly polymorphic missense variants in the candidate genes for the susceptibility to SARS-CoV-2 in all territorial groups characterized within the 1000 Genomes Project (Phase 3) include rs12329760 (p.Val160Met) of the TMPRSS2 gene; rs25653 (p.Arg86Gln), rs8192297 (p.Ile603Met), and rs25651 (p.Ser752Asn) of the ANPEP gene; rs12338 (p.Leu26Met, p.Leu26Val) of the CTSB gene; and rs1059091 (p.Ile121Leu) of the IFITM2 gene (Table 1). Moreover, rs12329760 of the TMPRSS2 gene, rs12338 of the CTSB gene, and rs1059091 of the IFITM2 gene are classified as potentially pathogenic according to the programs SIFT, PolyPhen, Mutation Assessor, and MetaLR, built into Ensemble [44].

The frequency of allele registration for most missense variants differs by 2 or more times between the studied ethnoterritorial groups (Table 1). SNPs, which significantly differ in the level of polymorphism between regions of the world (in some cases, the frequency of rare alleles varies from 0 to 42%), are also of interest for consideration. Thus, according to the 1000 Genomes Project [44], the A allele for rs75603675 (p.Gly8Asp) of the TMPRSS2 gene is recorded with the frequency of 2% in populations of East Asia, 22% in populations of South Asia, 0% in Africa, 27% in America, and 40% in European populations; the frequency of the T allele for rs1058900 (p.Val33Ala) of the IFITM2 gene is 0, 7, 4, 20, and 42%, respectively; the frequency of the T allele for rs14408 (p.Met41Arg) of the IFITM2 gene is 4, 27, 24, 33 and 60%, respectively.

Missense substitutions in the functionally significant region of the protein molecule (as, for example, in the case of rs12329760 of the TMPRSS2 gene, the p.Val160Met substitution is in the functional SRCR domain of the protein (mediates protein-protein interactions and ligand binding) [14]) can played an important role from the point of view of the formation of the susceptibility to coronaviruses. In the ANPEP gene, variants including those leading to nonsynonymous substitutions were identified in the region critical for binding to HCoV-229E (amino acids 260–353) [14, 50]. Such substitutions can affect the properties of proteins and, consequently, the differences in the degree of the susceptibility to coronavirus infection of the owners of different genotypes for these variants.

Among the polymorphic variants localized in exons of the candidate genes for the susceptibility to SARS-CoV-2 involved in the consideration, pathogenetic significance in relation to the risk of developing infectious diseases was previously established only for one SNP, rs12252 of the IFITM3 gene (OMIM: 605579, predisposition to severe course of influenza) [70, 71]. An association with a higher risk of severe H1N1/09 influenza was recorded for individuals with the C allele for rs12252 [71]. The authors of the cited study showed in vitro that cell lines with the CC genotype were characterized by a lower level of IFITM3 expression and a higher susceptibility to infection than lines with the TT genotype for this SNP. Despite the fact that the rs12252 polymorphic variant leads to the synonymous substitution (p.Ser14Ser), for some transcripts, this substitution is located in the region of splicing or 5'UTR [44]. Cells expressing the protein shortened by 22 amino acids are unable to restrict viral replication as compared with the wild-type IFITM3 protein [71]. The frequency of the unfavorable C allele varies from 4% in European populations to 53% in South Asian populations [44].

It was shown that carriers of the CC rs12252 genotype of the IFITM3 gene also have a risk of a more severe course of COVID-19 (the proportion of carriers of this genotype among patients with the average severity of the disease is 28.57%, with the severe course is 50%, and among the dead is 66.7%) [12]. Despite the fact that this study was performed on a small sample, taking into account the previously obtained data on the association of rs12252 with the severe course of H1N1/09 influenza [70, 71], on the lightning-fast development of viral pneumonia with the severe clinical picture in mice with knockout of the Ifitm3 gene under infection with low pathogenicity influenza virus [71], on the inability of CC homozygotes to limit replication of influenza H1N1/09 virus in the case of synthesis of truncated IFITM3 protein [71], and on the critical role of the 21 N-terminal amino acids of IFITM3 (as well as the C-terminal transmembrane region of the protein) for antiviral activity against vesicular stomatitis virus in vitro [72], it can be assumed that this variant has a high predictive value in assessing the risk of developing COVID-19 and for determining the nature of the course of infection caused by SARS-CoV-19.

Genetic Variants with Regulatory Potential

Numerous SNPs, including variants that can affect the level of gene expression, are recorded in the candidate genes for the susceptibility to SARS-CoV-2. eQTLs, the number and spectrum of which are different for different organs and tissues, are known for the candidate genes for the susceptibility to coronavirus infection (see Table 1, Fig. 1; see also [45]). For example, a higher expression level of the TMPRSS2 gene in the lungs is typical of the owners of the AA genotype for rs35074065, the TT genotype for rs34783969, the AA genotype for rs463727, etc., compared with other genotypes. In individuals with the AA genotype for rs1622599, a higher expression level of the CTSB gene was observed in lung tissues, blood cells, pancreas, and tibial artery, but a lower level was in the mucous membrane of the esophagus and thyroid gland [45].

The eQTL category for the TMPRSS2 gene includes rs2070788 and rs383510, whose GG and TT genotypes, respectively, are characterized by a higher level of this gene expression (including in the lung tissue) [45]. The same genotypes are associated with the susceptibility to influenza A (H7N9) virus and with the risk of severe influenza A (H1N1) [73]. Differences in the frequency of unfavorable alleles and, consequently, genotypes of the eQTL series are recorded between territorial groups of the population [44]. For example, the unfavorable G allele for rs2070788 of the TMPRSS2 gene occurred with the frequency of 27% in African populations, 36% in East Asian populations, and more than 46% in European, South Asian, and American populations.

Interestingly, the TMPRSS2 gene is located in close proximity to the genes MX1 and MX2, which encode two interferon-induced dynamin-like GTPases. MX1 has antiviral activity against a wide range of RNA viruses and some DNA viruses; MX2 is characterized by a strong antiviral effect against human immunodeficiency virus type 1. Markers rs2070788 and rs383510 are also eQTLs for the MX1 gene [14, 45, 74].

The A allele for rs17514846 located in the intron of the FURIN gene [7577], associated with diseases of the cardiovascular system (hypertension, ischemic heart disease, atherosclerosis of the coronary arteries), leads to a higher transcriptional activity of this gene in the endothelial cells of the vessels than the alternative allele [45, 78]. These data are in good agreement with clinical observations according to which a higher susceptibility to SARS-CoV-2 infection and a more severe course of COVID-19 are observed in people with diseases of the cardiovascular system [79]. The frequency of registration of the unfavorable A allele for rs17514846 varies from 16% in populations of East Asia to 88% in African populations; in European populations, the frequency of this allele is 46% [44].

The genotypes rs3118869 and rs2378757 affect the expression level of the CTSL gene [45, 80]. For the first specified eQTL, the dependence of the expression level on genotypes differed in organs/tissues (a higher level of CTSL expression was recorded in carriers of the C allele—in the skin, tibial nerve, and colon; in carriers of homozygous AA and CC genotypes—in subcutaneous adipose tissue and tibial artery; in carriers of the heterozygous genotype—in the blood). N. Mbewe-Campbell et al. [80] showed that the CC genotype of rs3118869, which is located in the CTSL gene promoter, is associated with arterial hypertension. It is possible that patients with arterial hypertension and the CC rs3118869 genotype of the CTSL gene are more susceptible to SARS-CoV-2 and damage to those organs, where the carriage of the CC genotype leads to a higher level of expression of this gene. For rs2378757, in 11 organs/tissues (including the lungs, aorta, subcutaneous adipose tissue), the level of CTSL gene expression decreased depending on the number of the A alleles: AA > AC > CC.

The specificity of expression profiles of the candidate genes for the susceptibility to SARS-CoV-2 (including because of the genetic characteristics of individuals) can determine interpopulation differences in the spread of COVID-19. In particular, some researchers associate different clinical scenarios of the course of COVID-19 in the regions of the world (in particular, in China and Italy) with the peculiarities of the expression level of the FURIN gene (one of the key genes; its product is involved in the process of the coronavirus entry into the host cell) [81]. For the FURIN gene, about 300 eQTLs are known (Table 1), but their number (Fig. 1) and spectrum differ in different organs and tissues [45].

The given examples of functionally significant polymorphic variants do not reflect their entire diversity. A. Paniri et al. [82], using in silico analysis of predicting the functional significance of the TMPRSS2 gene variability, identified 21 SNPs that affect the function and structure of the protein encoded by it: folding, posttranslational modification, splicing, and the regulatory potential of microRNA. In particular, rs12329760 changes the shape of the protein in such a way that a pocket of the protein is formed de novo; rs875393 creates a new donor site and sites in silencers and destroys three sites in the enhancer motif. A number of SNPs (rs12627374, rs456142, rs462574, rs456298) lead to the destruction/creation of binding sites and weakening or enhancement of the effects of various microRNAs (hsa-miR-548c-3p, hsa-miR-127-3p, hsa-miR1324, hsa-miR-5089, hsa-miR-204-5p, hsa-miR-211-5p, hsa-miR-4685-3p, hsa-miR-4716-5p). All these structural and functional features of the TMPRSS2 gene, like other candidate genes for the susceptibility to SARS-CoV-2, may underlie individual differences in the susceptibility to this infectious disease and determine the nature of its course.

Factors Affecting Expression of Candidate Genes for the Susceptibility to SARS-CoV-2

Clinical observations show that the presence of pathological conditions (diseases of the cardiovascular system and kidneys, endocrine disorders) and old age act as risk factors for the development, as well as the severe course of COVID-19 [8, 64, 83, 84]. In some diseases and with increasing age, changes in the expression level of the candidate genes for the susceptibility to SARS-CoV-2, including because of epigenetic modifications, may be recorded.

Expression of the ACE2 and TMPRSS2 genes in the sustentacular (accessory) cells of the olfactory epithelium increases with age (shown in the model objects) [85]. Older people have higher DPP4 activity [86] and CTSB level (CTSL level did not change) in various tissues and organs [87]. These observations may explain a higher susceptibility to COVID-19 in elderly people [88].

In many diseases, a change in the expression level of the candidate genes for the susceptibility to SARS-CoV-2 is recorded. In patients with idiopathic dilated cardiomyopathy and ischemic cardiomyopathy, expression of the ACE2 gene in the ventricular myocardium is increased [66]. At the same time, a higher risk of developing COVID-19 and complications from the cardiovascular system is shown in persons suffering from diseases of this system [83, 84, 88].

It should be noted that the level of ACE2 expression in scrapings of epithelial cells of the nose and bronchi in patients with bronchial asthma and respiratory allergies was reduced and showed an inverse relationship to the severity of the clinical picture (ACE2 expression was the lowest in patients with a high level of allergic sensitization and with allergic asthma), but nonatopic asthma was not associated with decreased expression of this gene [89]. The authors note that these data may explain such an unexpected clinical observation as a milder course of COVID-19 in patients with respiratory allergies and at the same time emphasize that ACE2 expression is one of the factors that may influence the response of these individuals to SARS-CoV-2 infection. Indeed, an increase in expression of the DPP4 and ANPEP genes was observed in the bronchi in allergic asthma [90], which may also affect the susceptibility to SARS-CoV-2. In addition, expression of all the candidate genes for the susceptibility to SARS-CoV-2 involved in the consideration is recorded in the lungs, and eQTLs affecting the expression level in the lungs were established for many genes (Fig. 1). Perhaps, the inconsistency of the results in relation to the effect of asthma on the risk of development and the nature of the course of COVID-19 is associated with the genetic characteristics of territorial population groups. Thus, Z. Zhu et al. [91] note that adults with asthma have a higher risk of severe COVID-19, and an increased risk was observed precisely among patients with nonallergic asthma.

An increase in the expression level of the DPP4 gene was registered in the alveolar epithelium and alveolar macrophages in patients with chronic obstructive pulmonary disease (COPD), with cystic fibrosis [92], with insulin resistance (shown in model animals and in cell culture) [93], and under hypoxic conditions [47]. An increased level of the DPP4 protein is considered as a predictor for developing metabolic syndrome [86]. In persons with type 2 diabetes mellitus (T2DM), a higher level and activity of DPP4 was observed in the plasma compared with healthy individuals; an increased level of this protein (but not activity) was also recorded in patients with T2DM and obesity compared with T2DM without obesity [94].

Drugs can have modifying effects on the regulation of the work of the candidate genes for the susceptibility to SARS-CoV-2 and the synthesis of proteins encoded by them. Thus, the expression level of the TMPRSS11D gene in lung biopsies of patients with COPD depended on the drug administration (fluticasone and salmeterol, a combination of inhaled steroids and long-acting beta-agonists) [95]. There were no differences in the expression level of the ACE2 and TMPRSS2 genes in the ileum and colon between patients with inflammatory bowel diseases and healthy individuals, but drug therapy in patients (in particular, administration of tumor necrosis factor blockers) led to a decrease in the expression level of ACE2 [96].

The level of expression of the candidate genes for the susceptibility to SARS-CoV-2 can be modified by the presence of other viruses in the body, and the direction of the change in expression depends on the infectious agent. Somewhat unexpectedly, some viruses (in particular, H5N1 avian influenza virus) increase expression of the IFITM1, IFITM2, and IFITM3 genes [97], which have antiviral effect [5759, 63]. At the same time, human papillomaviruses of high oncogenic risk (hrHPV) suppress expression of the IFITM1 gene in the keratinocytes as early as 48 h after infection [98]. It is possible that a change in the expression level under the influence of viruses depends not only on the infectious agent but also on the duration of the infectious process, as well as on the genetic characteristics of the organism and its functional state at the time of infection.

Other exogenous factors can also act as modifiers of the functioning of the candidate genes for the susceptibility to SARS-CoV-2. In the lungs of smokers, an increase in expression of the ACE2 and FURIN genes was found, but smoking had no effect on the expression level of the TMRPSS2 gene [99]. Smoking is associated with increased expression of the ACE2 gene in adipose tissue [100]. Some researchers suggest that adipose tissue in obese individuals may act as a source for a wider spread of viruses, their increased release, immune activation, and amplification of cytokines [64].

The effect of smoking on the methylation level of the cg23161492 site located in intron 1 of the ANPEP gene (smoking reduces the level of DNA methylation) in blood leukocytes was shown in a number of studies [101104]. A change in the methylation level of this CpG site affects expression of the ANPEP gene in lung tissue [105]. Interestingly, six genetic variants (rs28565347, rs12442778, rs12440213, rs8030857, rs8031576, and rs748508), which can affect the methylation pattern of CpG sites (mQTLs) in blood leukocytes, were identified for cg23161492 [106]. The methylation level of another site cg05312779, localized in 3'UTR of the ANPEP gene, is statistically significantly associated with the lung function; in particular, an increase in DNA methylation was associated with a decrease in the Tiffeneau index in adolescent girls [107]. These findings are consistent with clinical observations that smokers have a higher risk of developing COVID-19 symptoms when infected with SARS-CoV-2 [100]. However, assessments of the impact of smoking on the risk of developing COVID-19 are ambiguous (see, for example, [65]), which once again emphasizes the complexity of regulating the response of the host organism to infection with SARS-CoV-2.

Disturbance of the level of DNA methylation in other genes is also considered as a factor modulating the response to SARS-CoV-2 infection [108]. It is assumed that hypomethylation of the regulatory regions of the ACE2 gene in CD4+ T cells in patients with systemic lupus erythematosus, owing to oxidative stress caused by viral infection, can lead to an increase in expression of the ACE2 gene [108]. Hypomethylation of CpG sites and a high level of expression of the ACE2 gene are typical of many cancers [109], and this may explain the high risk of complications in patients with COVID-19 and such comorbidities [83].

Thus, despite the short period of research, to date, data indicating that the susceptibility and nature of the course of infection caused by SARS-CoV-2 may be determined by the genetic characteristics of the coronavirus and the human body have been accumulated. The structural and functional features of the SARS-CoV-2 S protein and human genes encoding receptors, proteases, and other proteins and enzymes that are important for the entry and replication of the coronavirus, as well as factors modifying the functioning of these genes (including the presence of concomitant diseases, harmful habits, infection with other viruses, drug administration, etc.), can determine the susceptibility and clinical picture of COVID-19 at the individual level and the features of the epidemiological situation at the population level.