Background

Clinical variability is high in unvaccinated individuals infected with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), ranging from silent infection to lethal disease. In ~ 3% of cases, infection leads to critical COVID-19 pneumonia, requiring high-flow oxygen (O2 > 6 L/min), mechanical ventilation (non-invasive or by intubation), or extracorporeal membrane oxygenation (ECMO) [1]. Advanced age is by far the strongest predictor of COVID-19 severity, with the risk of death doubling every 5 years of age from childhood onward [2, 3]. Men are also at greater risk of death than women [3,4,5]. Genome-wide (GW) association studies have identified several common loci associated with COVID-19 severity, the most significant being a region on chromosome 3p21.31 that was introduced by archaic introgression from Neanderthals [6,7,8,9,10]. The risk haplotype encompasses six genes (SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6, and XCR1) and confers an estimated OR per copy of between 1.6 and 2.1, with higher values for individuals under 60 years old [7, 11]. Twenty-four GW regions have been shown to be significantly associated with critical COVID-19 [10,11,12]. Four of these regions encompass genes involved in type I IFN immunity. The first, on chr12q24.13, containing protective variants, is also a Neanderthal haplotype [13] and includes the OAS1, OAS2, and OAS3 cluster, these interferon-stimulated genes (ISGs) being required for the activation of antiviral RNaseL. The second, a region on chr21q22.1, includes IFNAR2. The third, a region on chr19p13.2, includes TYK2. The fourth, a region on chr9p21, includes IFNA10. However, common variants have a modest effect size and explain only a very small fraction of the clinical variability [6, 8]. This prompted us to search for rare variants conferring a stronger predisposition to life-threatening COVID-19.

Through a candidate approach focusing on influenza susceptibility genes, the COVID Human Genetics Effort (CHGE [14]) provided proof-of-concept that autosomal inborn errors of TLR3-dependent and -independent type I interferon (IFN) immunity, including autosomal recessive (AR) deficiencies of IFNAR1 or IRF7, can underlie critical COVID-19 [15]. Other children with AR IFNAR1, IFNAR2, TBK1, or STAT2 deficiency were subsequently reported, as well as children with AR TYK2 deficiency [16,17,18,19,20] (Fig. 1). Some other groups were unable to replicate these findings, but the variants were not tested biochemically and it is unclear whether recessive defects were considered [11, 21,22,23]. There may also be other reasons for their findings [1, 24], the most important being the age distribution of the case cohorts. The other case cohorts were much older than ours (mean age of 66 vs. 52 years) and we found that inborn errors of immunity (IEI) were more frequent in patients under 60 years old [25]. Consistently, we recently reported that ~ 10% of children with moderate, severe, or critical COVID-19 pneumonia had recessive inborn errors of type I IFN immunity [19]. Moreover, older patients are more likely to carry pre-existing autoantibodies (auto-Abs) neutralizing type I IFN, which are found in about 15% of critical cases and up to 21% of patients over the age of 80 years [26, 27]. The presence of such auto-Abs has been replicated by at least 26 studies worldwide [28, 29], and we also recently showed that autoimmunity to type I IFNs is a strong common predictor of COVID-19 death in unvaccinated individuals, providing further evidence for the role of type I IFN immunity in life-threatening COVID-19 [29].

Fig. 1
figure 1

Type I IFN immunity genes associated with life-threatening COVID-19. Inborn errors of type I IFN immunity and autoantibodies neutralizing type I IFNs (α, β, ω) underlie life-threatening COVID-19 pneumonia by interfering with type I IFN immunity in respiratory epithelial cells (RECs) and blood plasmacytoid dendritic cells (pDCs). SARS-CoV-2 infection can induce type I IFN production in a TLR3-dependent manner in tissue-resident RECs (which express TLR3 but not TLR7) and in a TLR7-dependent manner in circulating pDCs (which express TLR7 but not TLR3). IRF7 is constitutively expressed in pDCs, at higher levels than in other cell types, whereas it is mostly induced by viral infection in RECs. Reported in red are the 13 genes (IFNAR1, IFNAR2, IRF3, IRF7, IRF9, IKBKG, STAT1, STAT2, TBK1, TICAM1, TLR3, TRAF3, and UNC93B1) investigated in a previous study [15]; TYK2 and TLR7 were subsequently shown to underlie severe COVID-19 [19, 30]

Using an unbiased X-wide gene burden test, we also identified X-linked recessive (XR) TLR7 deficiency in 17 male patients aged 7–71 years with critical COVID-19 pneumonia, accounting for ~ 1% of cases in men (Fig. 1) [30]. Moreover, six of the 11 TLR7 variants previously reported in patients from other studies were deleterious (carried by nine of 16 patients) [31,32,33,34,35,36], whereas the TLR7 variants in other studies were not disclosed [21, 22]. TLR3 senses viral dsRNA in respiratory epithelial cells, whereas TLR7 senses ssRNA in plasmacytoid dendritic cells [25, 28]. Both pathways induce the production of type I IFNs. TLR7 gain-of-function variants were recently shown to be associated with human systemic lupus erythematosus [37], providing an example of mirror genetic effects between infectious and inflammatory/autoimmune diseases [38]. Collectively, these findings suggest that type I IFNs are essential for protective immunity to SARS-CoV-2 in the respiratory tract, with insufficient type I IFN activity accounting for up to 15–20% of cases of life-threatening COVID-19. Despite this high proportion, the determinants of critical COVID-19 pneumonia remain to be identified in ~ 80% of cases. Here, we tested the hypotheses that other IEI may underlie critical COVID-19 pneumonia in at least some patients and that our initial findings could be replicated in a new cohort. With the CHGE, we performed a GW gene-based rare variant association analysis. This analysis was performed in both previously investigated patients who had not been screened at the GW level [15, 19, 30], and in newly recruited patients. We also tested the hypothesis that we could replicate our initial finding of an enrichment in pLOF variants of candidate type I IFN-related genes in newly recruited patients, given the controversy from other groups. We extended the analysis to two other type I IFN-related genes, TLR7 and TYK2, that we had recently found to be associated with critical COVID-19 [19, 30], and to branchpoint (BP) variants with a potentially strong impact on the splicing of the 15 type I IFN-related genes [39]. Finally, we refined the analysis of the type I IFN-related genes by taking age, sex, and zygosity into account.

Methods

Cohort

Since the beginning of the pandemic, we have enrolled more than 9000 individuals with SARS-CoV-2 infection and broad clinical manifestations from all over the world through the COVID Human Genetic Effort (CHGE). In this study, we focused on 3503 patients with life-threatening COVID-19 and 1373 individuals with asymptomatic/mild infection. Life-threatening COVID-19 cases were defined as patients with pneumonia who developed critical disease, whether pulmonary with high-flow oxygen (> 6 L/min) or mechanical ventilation [continuous positive airway pressure (CPAP), bilevel positive airway pressure (BIPAP), and intubation], septic shock, or any other type of organ damage requiring intensive care unit admission. We screened for the presence of autoantibodies (auto-Abs) against type I IFNs in all patients for whom plasma was available (N = 928), as previously described [26, 27], and we excluded 234 patients who tested positive for auto-Abs as they already have a major risk factor for developing critical COVID-19 [29]. In total, 3269 patients with life-threatening COVID-19 were included in the analysis. Among those 3269 patients, 1301 had been included in previous studies restricted to a short list of 18 candidate genes [15, 19] or to the X chromosome [30], and 1968 had not been studied before. Controls were defined as individuals infected with SARS-CoV-2 who remained asymptomatic or pauci-symptomatic, with the presence of mild, self-healing, ambulatory disease (N = 1373). The presence of infection was assessed on the basis of a positive PCR test and/or serological test and/or the presence of typical symptoms such as anosmia or agueusia after exposure to a confirmed COVID-19 case. Whole-exome (N = 2003 cases and 866 controls) or whole-genome (N = 1266 cases and 507 controls) sequencing was performed for the cases and controls, and high-quality variants were obtained from the sequencing data as detailed in the Additional file 1: Supplementary Methods.

Population stratification

Principal component analysis (PCA) was performed with PLINK v1.9 software [40] on a pruned subset of ~ 14,600 SNPs not in linkage disequilibrium (maximum r2 value for linkage disequilibrium 0.4 between pairs of SNPs) with a minor allele frequency (MAF) > 1%, call rate > 99%, and P value for departure from Hardy–Weinberg equilibrium > 10−5, as previously described [41]. Ethnic origin was inferred from the PCA as previously described [41].

Variant selection

For each gene, we considered several sets of candidate coding variants, defined according to (i) functional annotation: predicted loss-of-function (pLOF) variants only (including stop gain/lost, start lost, frameshift, or splice variants), or pLOF with missense and in-frame variants (MISSLOF); (ii) the gnomAD v2.1 allele frequency (AF): variants with a gnomAD allele frequency below 1%, 0.1%, or 0.01%; and (iii) Combined Annotation Dependent Depletion (CADD) score [42] for missense and in-frame variants: CADD score ≥ mutation significance cut-off (MSC) for the corresponding gene [43] or all variants regardless of the CADD score. We considered nine sets of variants in total: (1) pLOF variants with gnomAD AF < 1%; (2) pLOF variants with gnomAD AF < 0.1%; (3) pLOF variants with gnomAD AF < 0.01%; (4) MISSLOF with CADD > MSC and gnomAD AF < 1%; (5) MISSLOF with CADD > MSC and gnomAD AF < 0.1%; (6) MISSLOF with CADD > MSC and gnomAD AF < 0.01%; (7) MISSLOF with gnomAD AF < 1%; (8) MISSLOF with gnomAD AF < 0.1%; (9) MISSLOF with gnomAD AF < 0.01%.

Rare variant burden analysis

We performed a genome-wide gene-based rare variants burden analysis. For each gene, the genotypic information for candidate rare variants was summarized into a genetic score defined according to three genetic models: (1) co-dominant: samples were coded 2 if they carried at least one biallelic variant, 1 if they carried at least one monoallelic variant, and 0 otherwise; (2) heterozygous: samples were coded 1 if they carried at least one monoallelic variant and 0 otherwise; and (3) recessive: samples were coded 1 if they carried at least one biallelic variant and 0 otherwise. For the X chromosome, hemizygous males are considered to be equivalent to homozygous females. The association between the genetic score for each gene and the disease status was assessed with a logistic regression-based likelihood ratio test (LRT) from EPACTS (Efficient and Parallelizable Association Container Toolbox) [44] for the genome-wide burden analysis or R 3.6.0 [45] for the candidate type I IFN-related pathway. Firth’s bias correction, with the fast.logistf.fit function of EPACTS or the logistf function of the R logistf package [46], was applied if the P value of the LRT was below 0.05. Analyses were adjusted for sex, age (in years), and the first five PCs of the PCA In Firth’s regression, a penalty term is assigned to the standard maximum likelihood function used to estimate the parameters of a logistic regression model when there are rare events or when complete separation exists [47]. With no covariates, this corresponds to adding 0.5 to every cell of a 2 by 2 table of allele counts versus case–control status. For a given gene and variant set, the burden test was not performed if the number of carriers across all samples was below 3.

We used three analysis strategies: (1) joint analysis of all samples; (2) trans-ethnic meta-analysis: the analysis was stratified according to 7 inferred ancestry subgroups (African, North African, European, admixed American, Middle Eastern, South Asian, East Asian). For each subgroup, an ethnicity specific PCA was performed and used in the logistic regression model; and (3) trans-pipeline meta-analysis to account for heterogeneity due to the type of sequencing data: the analysis was stratified according to the type of data shared (FASTQ vs. VCF). Subgroup P values were subjected to further meta-analysis, accounting for the direction of the effect and sample size, with METAL [48].

Correction for multiple testing

For each gene, up to 9 burden tests were performed per genetic model. These tests were not independent; we therefore assessed the effective number of burden tests Meff with a method adapted from that described by Patin et al. [49], based on the approach of Li and Ji [50]. This approach makes use of the variance of the eigenvalues of the observed statistics correlation matrix to estimate Meff. The Bonferroni-corrected threshold was then defined as 0.05/Meff.

Odds ratio (OR) equality for homozygous/hemizygous versus heterozygous carriers of pLOF variants at type I IFN genes

We investigated whether the odds of critical COVID-19 differed for carriers and non-carriers of pLOF variants at the type I IFN immunity loci as a function of zygosity (homozygous/hemizygous vs heterozygous). In the full sample, we used LRT to compare a full Firth bias-corrected logistic regression model including two different parameters for carriers of pLOF as a function of zygosity (alternative hypothesis) with a Firth bias-corrected logistic regression model including only one parameter for carriers of pLOF, not taking zygosity into account (null hypothesis). The analysis was performed with the R logistf package.

Biochemical characterization of TLR7 variants with a luciferase reporter assay

We tested the TLR7 variants as previously described [30]. Briefly, TLR7 variants were generated by site-directed mutagenesis. The WT or variant alleles were re-introduced into a Myc-DDK-pCMV6 vector (Origene). HEK293T cells, which have no endogenous TLR7 expression, were transfected with 50 ng of Myc-DDK-pCMV6 vector, empty or containing the WT or a variant allele the reporter construct pGL4.32 (100 ng), and an expression vector for Renilla luciferase (10 ng), with the X-tremeGENE™ 9 DNA Transfection Reagent kit (Sigma-Aldrich). The pGL4.32 (luc2P/NF-κB–RE/Hygo) (Promega) reporter vector contains five copies of the NF-κB–responsive element (NF-κB–RE) linked to the luc2P luciferase reporter gene. After 24 h, the transfected cells were left unstimulated or were stimulated with R848 (1 μg/ml; resquimod), for activation via TLR7/8 (Invivogen), or R837 (5 μg/ml; imiquimod) (Invivogen), or CL264 (5 μg/ml; Invivogen), human TLR7-specific agonists, for 24 h. Relative luciferase activity was then determined by normalizing the values obtained against the firefly:Renilla luciferase signal ratio.

Results

Cohort description

Through the CHGE, we collected whole-exome sequencing (WES) or whole-genome sequencing (WGS) data for 3503 patients with life-threatening COVID-19 pneumonia (hereafter referred to as “patients”; see Supplemental Methods) and 1373 individuals with mild or asymptomatic infection, i.e., without pneumonia (hereafter referred to as “controls”). In total, 928 of the 3503 patients were screened for the presence of auto-Abs against type I IFN [26, 27] (Supplemental Methods) and the 234 patients who tested positive were excluded from this analysis as they already have a major risk factor for the development of critical COVID-19 [29]. In total, 1301 of the 3269 remaining patients had been included in previous studies restricted to a short list of 18 candidate genes [15, 19] or to the X chromosome [30], and 1968 had not been studied before. The mean age (SD) of the patients was 55.7 (17.4) years, with a male-to-female ratio of 2.4 (Table 1). The controls were significantly younger than the patients (P < 0.0001), with a mean age (SD) of 43.8 years (20.1 years) and were more likely to be female (P < 0.0001; male-to-female ratio = 0.7). The patients and controls were of various ethnic origins, mostly of European and Middle Eastern ancestry, according to principal component analysis (PCA) (Fig. 2). Raw sequencing data were either centralized in the HGID laboratory and processed with the HGID pipeline (2492 cases and 870 controls) or processed separately by each sequencing hub (777 cases and 503 controls; See Supplemental Methods). A joint analysis was performed first on the combined sample of 3269 patients and 1373 controls. Given the heterogeneity of the cohort due to different ancestries and processing pipelines, we also performed a trans-ethnic and a trans-pipeline meta-analysis; only results consistent across the three analyses are reported here (See Supplemental Methods).

Table 1 Baseline characteristics of study participants
Fig. 2
figure 2

Principal component analysis of patients with life-threatening COVID-19 (red) and SARS-CoV-2-infected controls (green). Principal component analysis (PCA) was performed with PLINK v1.9 software [40] on a pruned subset of ~ 14,600 exonic SNPs in linkage equilibrium (maximum r2 value for linkage disequilibrium of 0.4 between pairs of SNPs) with a minor allele frequency (MAF) > 1%, call rate > 99% and P value for departure from Hardy–Weinberg equilibrium > 10.−5. Samples were of diverse ethnic origins, including European (EUR), admixed American (AMR), North African (NAFR), sub-Saharan African (AFR), Middle Eastern (ME), South Asian (SAS), and East Asian (EAS)

Genome-wide analysis under a co-dominant model

We first performed a GW rare variant burden analysis on the 3269 patients with life-threatening COVID-19 and 1373 controls with asymptomatic/mild COVID-19 under a co-dominant model, using nine sets of variants (See Supplemental Methods). The QQ plots for the joint analysis of the samples revealed no systematic deviations from the null hypothesis, and the genomic inflation factors (λ) were close to 1 (Additional file 2: Table S1). In total, 18,064 genes were analyzed with at least one of the nine variant sets, resulting in an effective number of independent tests (Meff) for the joint analysis of 108,384, giving a Bonferroni-corrected significance threshold of 4.61 × 10−7. No gene was found to be of GW significance (see the Manhattan plot in Fig. 3A, Additional file 2: Table S2). The gene with the strongest association was TREH, encoding the trehalase enzyme, which hydrolyses trehalose, with rare (gnomAD allele frequency [AF] < 10−4) nonsynonymous variants associated with a lower risk of life-threatening COVID-19 (OR = 0.12[95% CI 0.05–0.28], P = 1.9 × 10−6; Additional file 2: Table S3). In analyses of genes for which rare predicted loss-of-function (pLOF) variants were associated with an increase in the risk of life-threatening COVID-19 (Table 2), the strongest association was that for NPC2, for rare (gnomAD AF < 0.01) pLOF variants, with 28 heterozygous carriers among patients (0.9%), and four heterozygous carriers (0.3%) among controls (OR = 5.41 [95% CI 1.8–16.4], P = 5.8 × 10−4). NPC2 encodes the Niemann-Pick disease type C2 protein and homozygous LOF mutations of this gene cause Niemann-Pick disease [51]. NPC2 interacts with NPC1, which is also an essential endosomal receptor for the Ebola virus [52, 53]. Both NPC1 and NPC2 were implicated in the regulation of SARS-CoV-2 entry in a CRISPR screen [54]. The GW burden analysis under a dominant model yielded similar conclusions (Additional file 2: Table S3).

Fig. 3
figure 3

Manhattan plot for genome-wide burden analysis under the co-dominant (top) and recessive (bottom) models. For each gene, the negative log-transformed p value of the joint analysis for the most significant variant set under a co-dominant (top) or recessive (bottom) model is plotted. For each gene, variant sets providing inconsistent results across the joint analysis, the trans-ethnic meta-analysis, and the trans-pipeline meta-analysis (i.e., P < 0.001 in the joint analysis and P > 0.05 in the trans-ethnic or trans-pipeline meta-analysis) were discarded. The red lines represent the significance threshold after Bonferroni correction to account for the total number of independent tests (P = 4.61 × 10−7 under a co-dominant model and 1.85 × 10−6 under a recessive model). The names of the top-ranked genes with a joint P < 10−4 are shown in red for rare variants associated with an increase in the risk of critical COVID-19 and in blue otherwise

Table 2 Top results of the genome-wide burden analysis for rare pLOF variants increasing the risk of life-threatening COVID-19 under a co-dominant model

Genome-wide analysis under a recessive model

We then performed a GW screen under a recessive model (autosomal and X-linked). In total, 4511 genes were analyzed with at least one of the nine variant sets, resulting in 27,066 independent tests, giving a Bonferroni-corrected significance threshold of 1.85 × 10−6. No gene reached GW significance (Fig. 3B). In analyses of genes with rare variants increasing the risk of life-threatening COVID-19, TLR7 was, by two orders of magnitude, the most significant gene, with 51 carriers (1.6%) of at least one rare (gnomAD AF < 0.01) missense or pLOF variant in patients versus two carriers (0.1%) in controls (OR = 8.41[95% CI 1.9–35.5], P = 8.95 × 10−5) (Table 3). Most of the carriers were male, with only one carrier among the patients and one among the controls being female. The variants carried by the two controls were previously shown to be biochemically neutral [19, 30] (Additional file 2: Table S4). The 51 cases carried 33 different variants, 13 of which had been shown to be neutral; 16 were previously shown to be hypomorphic or amorphic [19, 30], and four were previously unknown. The four new variants were tested: one was found to be neutral and the other three were deleterious (Additional file 1: Fig S1). Restricting the analysis to biochemically proven LOF variants (bLOF) decreased the number of carriers (20 cases vs. 0 controls), but the association signal remained highly significant, with a much higher odds ratio (OR = 27.68 [95% CI 1.5–528.7], P = 1.08 × 10−4) (Table 3). These findings confirm that TLR7 is a critical COVID-19 susceptibility locus, responsible for 0.9% of critical cases in male patients.

Table 3 Top results of the genome-wide burden analysis for rare variants increasing the risk of life-threatening COVID-19 under a recessive model

Genome-wide gene-based analysis including common variants

Published GWAS identified a number of common variants associated with severe COVID-19 pneumonia [8, 10, 12, 55]. We then assessed the combined effect of common and rare candidate coding variants at the gene level, in a weighted burden approach [56], as detailed in the Supplemental Methods. Briefly, for each individual, we calculated a genetic score by summing the number of minor alleles for each variant and weighting this sum by the frequency of the variant [57]. We then tested the association between this genetic score and case–control status in a logistic regression framework. As described above, we focused on pLOF only, pLOF and in-frame variants with CADD > MSC, or pLOF and in-frame variants without filtering on CADD score (Additional file 2: Table S5). As in the analysis focusing on rare variants only, no gene reached genome-wide significance after correction for multiple testing in this analysis considering both rare and common variants. The top-ranked gene, with consistent results across the joint analysis and the trans-ethnic and trans-pipeline meta-analyses, was TREH, with a protective effect against life-threatening COVID-19 of pLOF or nonsynonymous variants with a CADD score greater than the MSC (OR = 0.85 [95%CI 0.78–0.91], P = 3.6 × 10−6). Finally, we analyzed 20 candidate genes identified by GWAS for critical pneumonia in more detail [8, 10, 12, 55]. No significant association was detected for any of these genes (Additional file 2: Table S6), even with a relaxed Bonferroni threshold of 2.5 × 10−3, accounting for the number of GWAS genes.

Enrichment in rare pLOF variants at 13 type I IFN-related influenza susceptibility loci

Following on from our initial analysis [15], we also performed a candidate pathway enrichment analysis focusing on the 13 genes involved in Toll-like receptor 3 (TLR3)– and interferon regulatory factor 7 (IRF7)–dependent type I IFN immunity to influenza virus (IFNAR1, IFNAR2, IRF3, IRF7, IRF9, IKBKG, STAT1, STAT2, TBK1, TICAM1, TLR3, TRAF3, and UNC93B1) (Fig. 1). We confirmed the significant enrichment in rare (gnomAD AF < 10−3) pLOF variants at the 13 loci in patients with critical COVID-19, with 34 carriers among patients versus six among controls (OR = 3.70 [95% CI 1.7–9.5], P = 2.1 × 10−4 under a co-dominant model; Table 4). We also estimated this p value in a simulation study taking 13 loci randomly selected from a set of genes with similar pLI and CoNeS values (see Additional file 1: Supplemental Methods); we obtained an empirical p value of 3.7 × 10−4. Our cohort included 551 patients and 314 controls already screened for pLOF variants of the 13 genes included in our previous study [15] (Additional file 2: Table S7). The exclusion of these 551 cases and 314 controls resulted in a similar conclusion of enrichment in rare pLOF at the 13 loci (OR = 3.21 [95% CI 1.3–8.2], P = 5.97 × 10−3) formally replicating our initial association. Significant replication was also observed in the trans-ethnic (P = 0.01) and the trans-pipeline (P = 0.009) analyses. We found that 31 of the 34 carriers of pLOF variants were heterozygous, and three were homozygous: one for a frameshift variant of IRF7 described in a previous study [15], one for a previously reported deletion spanning 4394 base pairs in IFNAR1 [16, 19], and one for a previously unknown deletion spanning 6624 base pairs of IFNAR1 (Additional file 2: Table S8). All the biallelic pLOF variants were found in patients. Consequently, the OR for homozygous carriers (OR = 15.79 [95%CI 1.4–2170.4], P = 0.02) was higher than that for heterozygous carriers (OR = 3.11 [95%CI 1.4–8.6], P = 5.2 × 10−3), but both were significant.

Table 4 Enrichment analysis of rare pLOF/bLOF variants in genes involved in type I IFN immunity

Inclusion of TYK2 and TLR7 genes and branchpoint variants

Since the publication of the aforementioned study [15], AR TYK2 deficiency has been reported in children with COVID-19 pneumonia [19]. We identified two patients homozygous for a rare pLOF variant of TYK2 already described in a previous study [19] and one patient and one control heterozygous for a rare pLOF variant (Additional file 2: Table S8). Adding these patients to the analysis gave very similar results under a co-dominant model (OR = 3.30[95% CI 1.6–7.8], P = 1.4 × 10−4) and strengthened the evidence for association under a recessive model (OR = 19.65[95% CI 2.1–2635.4], P = 3.4 × 10−3) (Table 4). An analysis of the rare pLOF variants at these 14 loci plus the bLOF variants of TLR7 revealed highly significant enrichment (OR = 3.82 [95%CI 2.0–7.2], P = 1.3 × 10−7 under a co-dominant model). The effect was stronger for homozygous/hemizygous carriers (OR = 39.19 [95%CI 5.2–5037.01], P = 4.7 × 10−7) than for heterozygous carriers (OR = 2.27 [95%CI 1.0–5.2], P = 0.04), and these two ORs were significantly different (P = 0.008). We further screened the full cohort of cases and controls for intronic branchpoint (BP) variants, which might potentially have a strong impact on splicing and be considered pLOF variants, in the 15 type I IFN-related genes, with our new tool BPHunter [39]. We identified six branchpoint (BP) variants (Additional file 2: Table S9) carried in heterozygous state by 10 additional cases and no controls. Adding these BP variants to the analysis of the 15 type I IFN-related loci under a co-dominant model further strengthened the association signal (OR = 4.40 [2.3–8.4], P = 7.7 × 10−8) (Table 4).

Age and sex stratified analysis of the 15 type I IFN-related loci

Advanced age is the strongest risk factor for life-threatening COVID-19. Male individuals are also at higher risk than female individuals. As for the main GWAS hits [58, 59], we performed an analysis stratified for age and sex for the 15 type I IFN-related loci. The analysis stratified for sex revealed a much stronger association signal in male than in female individuals, as expected given the X-linked recessive mode of inheritance of TLR7 deficiency (Additional file 2: Table S10). Nevertheless, the enrichment in rare pLOF variants at the 15 loci in female subjects remained significant under a co-dominant model (P = 0.02) and a recessive model (P = 0.05). The addition of the BP variants strengthened the association signal in female subjects under a co-dominant model (P = 3.7 × 10−3). In the analysis stratified for age, we assigned the cases to two age groups (under 60 years of age vs. 60 years and over), which we compared with all controls. We used an age cut-off of 60 years, which was close to the median age of the cases, in accordance with the analyses performed in [7, 59]. The age stratified analysis revealed a strong impact of age, the genetic effect being restricted to younger cases (OR = 4.65 [2.4–9.0], P = 2.2 × 10−9, Additional file 2: Table S10). Accordingly, the 67 patients with critical COVID-19 carrying a rare pLOF or bLOF variant of one of the 15 genes were significantly younger than the remaining 3202 patients in the cohort (mean age [SD] in years: 43.68 [19.4] vs. 56.0 [17.3] years; P = 2.3 × 10−6), consistent with our previous reports that IEIs conferring a predisposition to life-threatening COVID-19 are more frequent in young patients [1, 15, 30]. Moreover, the homozygous/hemizygous carriers were significantly younger than the heterozygous carriers (35.2 [20.3] vs. 48.7 [17.1] years, P = 0.008, Additional file 1: Fig S2). Overall, these results clearly demonstrate that the search for additional rare variants conferring a strong predisposition to life-threatening COVID-19 benefits from focus on younger patients.

In-frame nonsynonymous variants at the 15 loci

We further screened our cohort for rare in-frame nonsynonymous variants with a gnomAD AF < 10−3 at these type I IFN-related susceptibility loci. For the 13 initial loci, the enrichment disappeared when in-frame nonsynonymous variants were added to pLOF variants under a co-dominant model (OR = 1.08 [95%CI 0.9–1.3], P = 0.42) (Additional file 2: Table S11), whereas a non-significant trend persisted under the recessive model (OR = 5.02 [95% CI 0.7–52.7], P = 0.06). Focusing exclusively on in-frame variants decreased the strength of this trend considerably, with only eight homozygous carriers among patients and one among controls (OR = 1.14 [0.2–912.5], P = 0.68). Adding TYK2 variants led to similar conclusions (Additional file 2: Table S11). We then added TLR7 variants and considered the 15 loci together. Under a co-dominant model, the enrichment became non-significant when in-frame nonsynonymous variants were added (OR = 1.15 [1.0–1.4], P = 0.09), but enrichment remained significant under a recessive model (OR = 6.54 [2.4–24.8], P = 5.3 × 10−6; Additional file 2: Table S11). In analyses considering only rare in-frame homozygous/hemizygous nonsynonymous variants, the effect size was smaller, but the enrichment remained significant (OR = 3.52 [1.3–13.3], P = 2.8 × 10−3). In total, 41 patients carried a rare homozygous/hemizygous in-frame nonsynonymous variant at one of the 15 loci, and 16 of these variants (carried by 16 patients) were TLR7 in-frame variants already shown to be bLOF. After excluding the TLR7 bLOF variants, there was no residual significant enrichment in rare in-frame nonsynonymous variants in patients relative to controls, whatever the genetic model considered.

Discussion

In this exome-wide gene burden analysis for rare variants underlying critical COVID-19, no gene reached GW statistical significance after accounting for multiple testing. We used simulations to determine the power of our sample to detect an association at the 2.5 × 10−6 exome-wide significance threshold (Additional file 1: Fig S3); our sample had a power of more than 80% for detecting alleles with a carrier frequency of 5 × 10−3 in the general population and a relative risk of critical COVID-19 of at least 6. These results are consistent with those of two previous large exome-wide studies including more than 1000 critical cases and thousands of population-based controls that found no statistically significant autosomal gene burden associations at stringent significance thresholds accounting for the number of phenotypes and variant sets analyzed [11, 21]. However, under a recessive model, the strongest association—albeit not statistically significant at GW level—was obtained with the X-linked TLR7 gene, for which association has consistently been reported across studies [21, 22, 30, 32], reaching the less conservative exome-wide significance threshold of 2.5 × 10−6 in some of these previous studies [21, 22]. It should be stressed that stringent correction for multiple testing, while necessary to avoid false positives, is a conservative strategy, and that the lack of formal statistical significance at a GW level does not preclude biological causality and medical significance. The burden of proof can be provided experimentally via biochemical, virological, and immunological experiments, as our previous studies of TLR7 in which we showed that biochemically deleterious TLR7 variants blunted the pDC-dependent sensing of SARS-CoV-2 and induction of type I IFN, thereby accounting for ~ 1% of critical pneumonia cases in men [30]. Additional genes may be found by restricting the association analysis to variants experimentally proven to be deleterious.

This analysis also confirms our previous findings of an enrichment in rare pLOF variants of 13 genes involved in TLR3- and IRF7-dependent type I IFN immunity to seasonal influenza virus in critical cases relative to controls with mild/asymptomatic infection [15]. These results were strengthened by the addition of TYK2, which was recently shown to underlie severe COVID-19 [19, 20], and TLR7, especially under a recessive model. We found that homozygous/hemizygous carriers of rare pLOF or bLOF variants at the 15 loci had a significantly higher risk of life-threatening COVID-19 than heterozygotes. This is consistent with the generally higher clinical penetrance of recessive than dominant IEI [1]. Overall, 1.7% of the patients with life-threatening COVID-19 carried a rare pLOF or bLOF variant at one of the 15 loci, these variants being homozygous/hemizygous in 0.8% of cases. Adding the BP variants at the 15 loci increased the proportion of carriers among patients with life-threatening COVID-19 to 2.1%. One of the STAT2 BP variants identified (2:56749159:T > A) has already been validated experimentally [39], but the effects of the five other BP variants identified require confirmation. The study of in-frame nonsynonymous variants might also increase this proportion, but would require the experimental characterization of all these variants. Indeed, in analyses restricted to rare in-frame nonsynonymous variants, we detected no significant enrichment in patients relative to controls. This result is not surprising, as we showed in a previous study [15] that less than 15% of the rare in-frame nonsynonymous variants at the 13 loci carried by cases initially studied were bLOF variants, whereas all the pLOF variants were found to be bLOF. Similar results were obtained for TLR7, with only 10 of 108 (9.2%) in-frame nonsynonymous variants observed in gnomAD being bLOF [30]. This high proportion of neutral variants strongly affects the power of burden tests and highlights the need for the experimental characterization of variants.

We also showed that patients carrying rare pLOF or bLOF variants of these 15 type I IFN-related genes were significantly younger than the remaining patients (mean age [SD] in years: 43.3 [20.3] vs. 56.0 [17.3] years). This was particularly true for homozygous/hemizygous carriers of rare pLOF or bLOF variants (35.2 [20.3] years), potentially accounting for the lack of replication of this finding by other studies including older patients [11, 21,22,23]. Consistent with this result, we recently found that ~ 10% of children hospitalized for COVID-19 pneumonia carry recessive inborn errors of type I IFN immunity [19]. In addition, older patients are more likely to carry auto-Abs against type I IFN, and unlike previous studies, we excluded patients carrying such antibodies from this analysis. None of the 234 patients with critical COVID-19 excluded from this study due to the presence of auto-Abs against type I IFN carried a rare pLOF variant of the 15 genes. Hence, samples in which the vast majority of patients are over the age of 60 years and of unknown status for auto-Abs against type I IFNs would have a much lower power to identify these rare inborn errors of type I IFN immunity.

Conclusions

Rare autosomal inborn errors of type I IFN-dependent immunity to influenza viruses can underlie critical forms of COVID-19, especially in subjects below 60 years of age, in addition to X-linked TLR7 deficiency. The search for additional rare mutations conferring a strong predisposition to life-threatening COVID-19 should focus on young patients with critical COVID-19 without auto-Abs against type I IFNs.