The major histocompatibility complex (MHC) molecules are cell surface protein complexes encoded in the human leukocyte antigens (HLA) locus. The HLA locus is highly polymorphic with thousands of different alleles and millions of haplotypes reported [1, 2]. HLAs are associated with several infectious diseases. Previous disease association studies showed that some HLA haplotypes are highly correlated with multiple viral infections [3,4,5] and have been reported to confer differential susceptibility to infection and the severity of the resulting disease [6]. In particular, HLA class I alleles are key players in the immune defense against viruses as they initiate the activity of T cells against the invading intracellular pathogens [7, 8]. Although the presentation of viral antigens relies classically on MHC class I molecules, MHC class II genes have also been associated with the outcome of many viral infections [9,10,11]. Studies in a range of species [12,13,14,15], including humans [16, 17], imply a heterozygous selection mechanism operating on the HLA loci as an explanation for the extensive variability of the MHC molecules [18, 19]. However, recent results on human populations suggest that such an advantage may be limited [20].

In the context of COVID-19, specific HLAs have been associated with coronaviruses in the past, including SARS coronavirus infections [21, 22]. Several efforts were made to identify HLA-related susceptibility to SARS-CoV-1 after the first SARS epidemic in East Asia [23,24,25,26] and the MERS-CoV outbreak in 2014 in Saudi Arabia [27]. Most of these case studies provided weak or conflicting results and required further validation due to the relatively small sample size. An association between the geographic distributions of COVID-19 and HLA alleles has been suggested [28]. Similarly, an association between severe disease outcome and specific HLAs was suggested by another small-scale whole-genome sequencing study [29]. In silico model raised the possibility that HLA supertypes may have a role in the severity of the disease [30]. Moreover, expression of human leukocyte antigen class DR (HLA-DR) was decreased [31] or contrarily increased [32] in peripheral blood mononuclear cells of COVID-19 patients. Nevertheless, genome-wide association study (GWAS) among 1980 Italian and Spanish patients with severe COVID-19 detected a few loci associated with that state but no association with the HLA loci at chromosome 6 or heterozygote advantage was found [33]. However, another GWAS study analyzing 2244 critically ill COVID-19 patients in the UK detected three variants in chromosome 6 but these variants were not replicated in other studies. Since population stratification in chromosome 6 (the major histocompatibility complex) is difficult to control, further studies will be required to determine whether these associations are real [34]. Two other large unpublished GWAS studies did not report a signal at the HLA locus on chromosome 6 [35, 36]. To the best of our knowledge, there are no previous studies looking specifically at HLA subgroups among individuals with COVID-19.

HLA allele and haplotype frequencies differ between populations belonging to different ethnic groups [37, 38]. For example, among Jewish populations, similarities of HLA alleles may be seen within ethnicities, allowing an estimation of an individual’s origin based on his HLA genotypes [39]. It has been estimated that decreased HLA variability may assist in detecting signals relevant to a specific disease state. Indeed, specific HLAs were associated with different disorders in the Ashkenazi population, a large genetically isolated group [40,41,42]. Given the diversity of HLAs among populations, testing genetically isolated and genetically homogenous populations may provide an advantage in detecting the effects of specific genetic variants, in particular in Ashkenazi Jews. Note that this is not a genome-wide associate study, where the case and controls can be matched by the PCA components of their genotypes. In HLA association studies, self-reporting is the standard [43, 44].

Our study aimed to utilize the presence of a large-scale cohort having HLA genotypes and a high rate of COVID-19 to test the possible role of specific HLA alleles in SARS-CoV-2 infection and its severity. Using a dataset of this magnitude also allowed us to analyze associations between COVID-19 infection and HLA zygosity. Understanding how variation in HLA may affect both susceptibility and severity of COVID-19 infection could help identify and stratify individuals at higher risk for the disease and may support future vaccination strategies.


Study Design

A case-control study with a test-negative design [45] was used to evaluate the infection outcome, while a retrospective cohort study was used to evaluate the hospitalization outcome. In the case-control design, patients who tested positive or negative for SARS-CoV-2 were evaluated for their HLA alleles. In the cohort design, patients who were diagnosed with COVID-19 were followed up to see who were hospitalized.

The study was based on patient data taken from the data warehouse of Clalit Health Care (CHS), a large integrated payer-provider healthcare organization operating in Israel. CHS insures over 4.6 million Israelis (~50%), which are a representative sample of the entire population.

HLA Dataset

HLA alleles were obtained from the Ezer Mizion Bone Marrow Donor Registry, which enlists the highest number of registered unrelated volunteer donors per capita in the world. The HLA resolution varies from serologic (8%) to DNA-based testing at low (32%), intermediate (22%), and allele resolution (38%) [46]. The initial analysis included HLA genotypes of 1,040,250 donors which represent all the populations composing the Ezer Mizion registry. High-resolution five-locus typing for HLA-A, -B, -C, -DRB1, and -DQB1 was imputed for all donors using GRIMM (GRaph IMputation and Matching for HLA Genotypes) [47], which produces the most likely five-locus genotype consistent with the low-resolution typing and the self-defined ethnicity. For each subject in the dataset, we chose the unphased genotype with the highest probability. We also used the low-resolution two-digit typing of each subject, as well as the homozygosity status of each allele.

Cases and Controls

The study included all patients tested for SARS-CoV-2 by PCR that were members of CHS at the date of testing and had HLA data available. Cases were those patients tested positive, while controls were patients tested negative. Covariates for adjustment included age, sex, number of children in the household, population sector (ultra-orthodox vs. general, which has had an important impact on the epidemic in Israel), socioeconomic status, and ethnic origin (Ashkenazi vs. other). The analyses for the hospitalization outcome were further adjusted for obesity and history cardiovascular disease, diabetes, chronic respiratory disease, chronic renal disease, chronic hepatic disease, chronic neurological disease, chronic hematological disease, and immunosuppression. These covariates were extracted as of the test date. The number of children in the household was calculated based on CHS’ demographic registry. The population sector was determined using the patient’s primary-care clinic address. Socioeconomic status was determined based on a patient’s home address. Patients were defined as Ashkenazi if they were Jewish and were born themselves or had at least one parent or 2 grandparents born in central or western Europe. A severe disease course was defined by hospitalization less than 2 weeks after the molecular diagnosis.

Statistical Analysis

The association between each HLA allele and the outcome, adjusted for the covariates listed above, was tested separately for each allele using logistic regression, where all the covariates and the presence or absence of the HLA were used as an input, and the outcome (infection or hospitalization) was used as the prediction. The effect of homozygosity was tested similarly, as a binary predictor, separately for each locus, and a similar logistic regression was performed on the same outputs. A Bonferroni correction was applied to the p-values and confidence intervals. The results were similar when the Benjamini-Hochberg false discovery rate was controlled [48]. Zygosity was computed based on two-digit HLA alleles at the appropriate locus. The study was performed with the approval of the Institutional Research Community (IRB) #0080-20-COM2).


Overall, 72,912 individuals with HLA genotyping and at least one PCR test for COVID-19 were included in the study between March 1, 2020, and October 31, 2020. Of them, 8.8% (6413) tested positive for SARS-CoV-2 by PCR. Demographic and socioeconomic data related to the population are available in Table 1. The sample is large enough to allow for the detection of OR with a significance of 1.3 (Supplemental Figure S1).

Table 1 Demographic characterization of study groups

All HLA alleles were grouped by two-digit representations to minimize the number of tests, and only alleles with a frequency of at least 1 % were used. A total of 16 HLA-A, 20 HLA-B, 13 HLA-C, 5 HLA-DQB1, and 12 HLA-DRB1 (Table 2) met the criteria for analysis. No significant differences in HLA- allele frequency were detected between individuals with negative and positive SARS-CoV-2 PCR (Fig. 1, left panel).

Table 2 HLA- subgroups (2-digit) analyzed
Fig. 1
figure 1

Odds ratio (OR) as a function of HLA allele. In all figures, the dot represents the odds ratio, and the interval represents the 95% confidence intervals (Bonferroni correction). The left plot is for SARS-CoV-2 infection and the right plot is for hospitalization. We trim error bars at 3 for better visualization. The analysis presented here is for the full population. One can see that no HLA allele is significantly associated

We further tested whether HLA genotypes may be associated with severe COVID-19 infection. Of the 6413 infected individuals, 181 (2.6%) were hospitalized within 2 weeks. Again, no significant HLA genotypes were found to be associated with hospitalization among individuals positive for SARS-CoV-2 (Fig. 1, right panel).

Given the association between HLA and ethnic origin and the fact that the Ashkenazi Jewish population is genetically isolated [36], we tested for SARS-CoV-2 presence and severity within the Ashkenazi population. This analysis included 20,937 Ashkenazi donors, of which 1495 were positive for SARS-CoV-2 and 45 were hospitalized. No significant association was observed between either SARS-CoV-2 positive frequency (Fig. 2, left panel) or COVID-19 severity (Fig. 2, right panel) and any HLA subgroup in the Ashkenazi population (Fig. 2).

Fig. 2
figure 2

Odds ratio (OR) as a function of HLA allele in the Ashkenazi population. In all figures, the dot represents the odds ratio, and the interval represents the 95% confidence intervals (Bonferroni correction). The left plot is for SARS-CoV-2 infection and the right plot is for hospitalization. We trim error bars at 3 for better visualization. The analysis presented here is for the Ashkenazi population. One can see that no HLA allele is significantly associated

To investigate whether HLA heterozygosity is associated with the outcome of SARS-CoV-2 infection, we further tested whether an increased probability of SARS-CoV-2 infection or severe disease course is associated with a higher homozygous rate of each of the 5 HLA loci subgroup. Homozygosity was defined at the two-digit level. No such association was detected either in the general population or in the Ashkenazi population (Fig. 3).

Fig. 3
figure 3

Effect of homozygosity on either infection or hospitalization in the entire studied population or the Ashkenazi population. Again no significant association was found. Homozygosity at a given locus is defined as having the same low-resolution allele in this locus

To ensure our results were not the results of lowering the resolution of the HLA, we repeated the analysis above for both infection and hospitalization for the general population and the Ashkenazi population using the 50 most frequent HLA alleles in the population (4 digits), and none had a significant effect in any of these tests (Supplemental Table S1).


The COVID-19 pandemic is associated with heavy medical, economic, and social costs. A few factors were associated with decreased or increased risks of SARS-CoV-2 infection, including smoking [49] and gender (male) [50]. Genetic factors were also found to be associated with COVID-19 disease severity [51], including a genomic segment of around 50 kilobases in size that is inherited from Neanderthals [52]. Similarly, a meta-analysis of GWAS studies of patients from Spain and Italy with severe COVID-19, which was defined as a hospitalization with respiratory failure, detected three significant loci with odd ratios of 1.32–1.77 per locus [33]. Given the important role of HLA in several viral infections, the above study analyzed the extended HLA region (chromosome 6, 25 through 34 Mb) but did not find significant SNP association signals at the HLA complex. Another GWAS study analyzing 2244 critically ill COVID-19 patients in the UK detected potential associations between variants in chromosome 6 and COVID-19 [34].

Our study, performed on a large-scale dataset, did not detect any specific HLA subgroups associated with increased or decreased risk for SARS-CoV-2 infection, or a severe disease course causing hospitalization. We also did not detect any associations of heterozygote advantage with COVID-19 when we analyzed the entire study population or when analyzing a sub-population. Despite clear successes in identifying novel disease susceptibility genes, GWAS approaches have not been without controversy [53]. A major limitation of this approach is the need to adopt a high level of significance to account for the multiple tests burden due to the very large number of loci compared concurrently. As such, it can miss important associations. In contrast with GWAS studies, we incorporated HLA alleles with a prevalence of >1% in the Israeli population. This approach enabled a large-scale analysis not of a few HLA subtypes but of 66 variants, and still no association was found in the HLA region

Two main caveats should be considered. First, the imputation of HLA alleles may induce errors. However, the precision of the imputation was previously established [54]. Moreover, no association was found with the A alleles that were typed in the vast majority of subjects. Finally, the analysis was performed at the two-digit level, which is practically unaffected by imputation. Another possible limitation is the age distribution. Our study included only a small proportion of hospitalized cases 181/6413 (2.6%). This may be related to the average young age of individuals for whom we have HLA data. However, this should not affect the results related to SARS-CoV-2 infection. While hospitalization alone may not be an indicator of disease severity, we expect a significant enrichment of severe cases in hospitalized patients.

The lack of HLA association reported here and in previous smaller studies, and the absence of advantage for heterozygote may suggest that heterozygote advantage in HLA may be more limited in general than often considered. This is in agreement with more recent claims that other mechanisms beyond heterozygote advantage or pathogen-driven balancing selection may be the source of the large HLA polymorphism [2, 20].

We did not detect an HLA association with the initial phases of infection and hospitalization. As in other diseases, the initial response to the disease in the first few days is mainly mediated by the innate immune response, limiting the involvement of HLA and T cells. HLA may still be involved in the later phases of the disease progression, such as long-term sequels, or complications following hospitalization. A long-term follow-up will be needed to test for those.