Introduction

Asthma is a complex respiratory disease characterized by reversible airflow obstruction and chronic inflammation of the lower respiratory tract, usually linked to allergic and atopic manifestations1. Asthma has an increasing prevalence, affecting more than 300 million people worldwide2,3, which represents a high health care cost. In Europe, the economic burden of asthma in adults is estimated to account for over €19 billion per year4. This global health problem is partly because of the heterogeneity of the disease that makes both its clinical management and research challenging. In this sense, asthma and the allergic traits are strongly conditioned by genetic factors, with heritability estimates ranging between 35 and 95%5. Several genome-wide association studies (GWAS) have revealed several asthma risk genes, including those encoding the ORMDL Sphingolipid Biosynthesis Regulator 3 (ORMDL3), Gasdermin B (GSDMB), and Cadherin Related Family Member 3 (CDHR3), as well as a number of genes related to innate immunity and immunoregulation6.

Despite the efforts made by large consortia, less than 5% of asthma variability can be explained by the genetic risk variants revealed to date7. Among other reasons, this could be because most of the previous genetic association studies have focused on patients of European ancestry8,9,10. Therefore, other genetic risks that are more frequent in other ethnicities could remain unknown. The key implications of ancestry in asthma risks have been evidenced in previous studies in non-European admixed populations that revealed novel gene risks for asthma in African American and Latino populations10,11,12,13,14. However, although in recent years the number of genetic studies in admixed populations has increased considerably, the statistical power in many of them has not been sufficient due to the large sample sizes required by GWAS to overcome the stringent significance penalties, making screening for asthma-related genes in admixed populations still a challenge10.

In this sense, studies leveraging local genetic ancestry analyses constitute an alternative to further disentangle the genetics underlying asthma in recently-admixed populations11,15,16,17,18,19, given that they attain better study power using more limited sample sizes20. Briefly, the genome of admixed individuals is constituted by chromosomal segments from their parental populations. The average proportion with which each parental group contributes to the genome of an admixed individual is known as global genetic ancestry, whereas the local genetic ancestry is defined as the ancestry proportion of each particular locus. Studies such as admixture mapping studies allow to reveal genomic regions where local ancestry correlates with disease risk, and the main challenge remains in the subsequent fine mapping studies of these regions to identify the causal variants17,21.

We previously assessed the local ancestry estimates in an admixed southwestern European population, the Canary Islands population (Spain), revealing the largest African genetic ancestry in Europe, with average proportions of 22.0% of North African (NAF) ancestry and 3.0% of Sub-Saharan African (SSA)22. Additionally, we identified five genomic regions showing an excess of African ancestry in this population within chromosomes 2, 3 (two regions), 6, and 13. These regions were enriched in genes linked to respiratory diseases, including asthma22, which is not surprising since the Canary Islanders have the highest prevalence of asthma in Spain23,24. As a matter of fact, we recently performed an admixture mapping of asthma in this population, revealing a novel locus associated with asthma risk16.

Based on the previous evidence, we hypothesized that the targeted screening of the five genomic regions from the current Canary Islander population that were found to be enriched in African alleles could reveal novel asthma risks. To test this possibility, we conducted a two-stage association study focused on the regions of interest in Canary Islanders, followed by a fine-mapping study of the highly variable human leukocyte antigen (HLA) region.

Results

After quality controls, 930 Canary Islanders from stage 1 (313 cases and 617 controls) and 557 from stage 2 (251 cases and 306 controls) remained in the study. Detailed information of these participants can be found in Table 1. Association testing focused on a total of 140,955 imputed genetic variants located on the five loci enriched in African ancestry in the Canary Islands population (2q21.2-q22.3, 3p25.3, 3q26.32, 6p22.3-p21.32, and 13q21.1-q21.33). The number of variants of each region can be found in Supplementary Table S1. After meta-analysis of results from the two stages, the variant rs1049213 from 6p22.3-p21.32 was significantly associated with asthma risk (OR [95%CI] = 1.74 [1.42–2.14]; p = 1.30 × 10–7) (Table 2). This variant is in the 3′untranslated region (3′ UTR) of the Major Histocompatibility Complex (MHC), Class II, DQ Beta 1 gene (HLA-DQB1) (Supplementary Figure S1). We accessed public GWAS data available at Open Target Genetics25 and found that this single nucleotide polymorphism (SNP) has been associated with asthma (p-value = 1.13 × 10–38, OR = 1.12) (http://www.nealelab.is/uk-biobank/) and broad allergic phenotype26 before. No significant variants were identified in the other regions of interest on chromosomes 2, 3, and 13.

Table 1 Demographic and clinical characteristics of the individuals included in the study.
Table 2 Ten most significant variants after the meta-analysis of the targeted association. Association results from the 6p22.3-p21.32 region for stage 1, stage 2, and meta-analysis.

The HLA region is one of the most polymorphic regions of our genome, which makes it difficult to investigate its role in the disease. In this sense, new methods have emerged to impute alleles of classical HLA genes (“classical alleles” from now on) and amino acid polymorphisms from SNP genotyping data27,28,29, providing additional information that can be used to assess the role of this region in asthma physiopathology. We fine mapped this region using a specific method to impute a total of 172 common HLA classical alleles from 3 class I genes (-A, -B, -C) and four class II genes (-DPB1, -DQA1, -DQB1, -DRB1) and assessed their association with asthma susceptibility (Supplementary Table S2). A total of 10 classical alleles within six different HLA genes showed a nominal significance with asthma in stage 1 (p < 0.05) (Table 3). Two of them, HLA-DQA1*01:02 and HLA-DQB1*06:04, showed nominal significance and consistent direction of effects in stage 2 (p = 0.007 and p = 0.025, respectively). The association with asthma protection of HLA-DQA1*01:02 reached study-wise significance in the meta-analysis results from stage 1 and stage 2 (OR [95% CI] = 0.64 [0.50–0.82], p = 3.98 × 10–4) (Table 3, Fig. 1). This association was robust to the model adjustments for sex, age, body mass index, and local NAF or SSA ancestry (Supplementary Table S3) which suggests a correlation between both HLA-DQA1 and HLA-DQB1 variants. Accordingly, HLA haplotype analyses revealed that the haplotype DQA1*01:02-DQB1*06:04 was significantly associated with asthma protection (meta-analysis p = 4.71 × 10–4, OR[95% CI] = 0.47[0.29–0.73]) after setting a Bonferroni threshold at p = 2.17 × 10–3 (based on the 23 DQA1-DQB1 haplotypes tested, Table 4). Interestingly, HLA-DQB1*06:04 was the allele with the second most significant association in our study.

Table 3 Classical HLA alleles nominally significant in stage 1 and their results in stage 2 and in the meta-analysis.
Figure 1
figure 1

Manhattan plot of meta-analysis results for the association study of chromosome 6 region (grey) and the classical HLA alleles (blue). The y-axis displays transformed p-values (–log10) while the x-axis represents chromosome positions (GRCh37/hg19). The horizontal lines correspond to the significance thresholds of each study after Bonferroni correction: the upper for the targeted association (p = 1.20 × 10–6) and the lower for the classical HLA alleles mapping (p = 4.50 × 10–4). The significant variants are highlighted in orange (targeted association testing of SNPs) and red (fine mapping of classical HLA alleles). The code used to plot the data was obtained from HATK30.

Table 4 Results of the HLA haplotype assessment after meta-analysis.

The HLA-DQA1*01:02 sequence translation to the amino acid sequence revealed a specific missense substitution within exon 2 at position 57 (Glu57Gln, E57Q) (Supplementary Figure S2). Mapping each nucleotide position within the altered codon indicated that E57Q corresponds to rs10093, which did not reach statistical significance in our study (meta-analysis OR [95%CI] = 1.13 [0.96–1.33], p = 0.139). However, a few of its linkage disequilibrium (LD) proxies (r2 > 0.77 in Europeans) were nominally significant in the meta-analysis (results for the leading variant rs9271588, OR [95% CI] = 1.20 [1.02–1.41], p = 0.027). Bioinformatic tools for variant prioritization (DSNetwork, VEP, and RegulomeDB) showed that the top-ranked proxy for rs10093 was rs9271588, an intergenic variant located between HLA-DQA1 and HLA-DRB1, which was predicted to have the higher confidence of regulatory impact (Supplementary Table S4). In this sense, rs9271588 was found to be linked to relevant functional consequences based on diverse in silico approaches (Supplementary Table S4). As a summary, this SNP is located within regulatory elements of genes in a subset of tissues and cell types, including enhancer and promoter histone marks and DNase I hypersensitive sites (Supplementary Table S4). Furthermore, rs9271588 is also implicated in disrupting regulatory motifs and protein binding (Supplementary Table S4). Hi-C experiment results supported physical chromatin interactions between the region harbouring rs9271588 and HLA-DQB1 and HLA-DRB1 promoter regions in diverse cell lines (Supplementary Table S4). Additionally, GTEx data supported that rs9271588 is an eQTL and sQTL for HLA-DQA1, HLA-DQB1, and HLA-DRB1 in different tissues, including lung, esophagus, and whole blood (Supplementary Table S4). This SNP also has high CellulAr dePendent dEactivating (CAPE) scores for eQTLs for HLA-DQA1, HLA-DQB1, and HLA-DRB1 in a lymphoblastoid cell line (Supplementary Table S4). Finally, transcriptomic data from bronchial brushing and bronchoalveolar lavage (BAL) samples revealed that HLA-DQA1 (only available for bronchial brushing), HLA-DQB1, and HLA-DRB1 were downregulated in patients with severe asthma compared to healthy controls (Supplementary Figure S3 and Supplementary Figure S4).

Discussion

Several studies support that non-European populations would carry the greatest burden of asthma10. However, the assessed genetic diversity of the published GWAS of asthma is strongly biased towards Central and Northern Europeans, and additional genetic studies in more diverse populations are required to identify novel genetic risk factors for asthma10. Our study provides significant insights of genetics underlying asthma susceptibility in a population with the largest recent African admixture recorded so far among southwestern Europeans31. Besides, the available estimates support that the Canary Islands population has the greatest prevalence of asthma in Spain23,24. Here we performed a targeted staged association study of five genomic regions that we previously described to be enriched in African ancestry among Canary Islanders, and revealed that genetic variants in 6p22.3–p21.32 and within HLA-DQA1 and HLA-DQB1 were significantly associated with asthma risk. HLA genes are located within the MHC and encode a group of proteins with important functions in cell–cell interactions and are critical regulators of the immune response32. In fact, a number of genetic variants from the MHC (some of them linked to HLA-DQA1 and HLA-DQB1) have been associated with asthma in large genetic association studies10,33,34. Interestingly, other studies in populations with African ancestry have allowed to link classical HLA class II gene alleles to the total serum IgE levels in patients with asthma35,36.

Our staged association study revealed the association of rs1049213 with asthma risk, which has been previously linked to asthma, supporting the robustness of our results. Accordingly, it is broadly known that interpreting SNP associations can be problematic, partly due to the difficulty in identifying the causal variant37, and this interpretation becomes even more complicated when the prioritized variant is in the highly polymorphic HLA region. Because of that, classical HLA alleles are more biologically informative and can be related to stronger effects than individual SNPs38,39. Our results revealed that the classical HLA allele HLA-DQA1*01:02 was significantly associated with asthma protection. HLA-DQA1*01:02 had never been linked to asthma risk before, although it has been related with protection from infectious40 and autoimmune diseases, including type-1 diabetes41, Crohn’s disease42, tubulointerstitial nephritis, and uveitis syndrome43, and risk for peanut allergy44. Additionally, HLA haplotype analyses revealed, for the first time, the association of DQA1*01:02-DQB1*06:04 with asthma protection, which supports that the combination of variants within both genes could influence the pathophysiology of the disease. This is in line with previous studies in asthma where HLA haplotypes were identified to be significantly associated with the disease45. The protein encoded by HLA-DQA1 binds to the protein encoded by HLA-DQB1, constituting a heterodimer that plays a central role in the immune system. Interestingly, our analyses prioritized an LD proxy of the missense change defining the HLA-DQA1*01:02 as a clear lung eQTL for HLA-DQA1 and HLA-DQB1. This evidence agrees with human transcriptomic observations revealing a reduction of the HLA-DQA1 and HLA-DQB1 lung gene expression in patients with severe asthma compared to healthy controls. Thus, a lower activity of the heterodimer among asthma patients would be expected in carriers of this classical HLA allele.

We acknowledge some limitations of this study. First, all individuals were selected based on their self-declaration of having two generations of ancestors born in the Canary Islands. Nevertheless, this agrees with the National Institutes of Health (NIH) guidelines, and previous genetic studies in the same population have shown that this selection has no major effect on the findings of the study16,22. Additionally, the use of population controls could bias the results in the sense that it is possible that these individuals could develop asthma or related phenotypes during their lives. However, this type of controls has been widely used in genetic and genomic studies and is the most optimal control group to get the most representative subset of the population, reducing the selection bias10,46. We did not include other confounders such as environmental factors or responses to asthma treatment because data were lacking for most individuals. However, we validated our results in an independent case–control sample, supporting that our results may not be biased by these variables. Significant differences in sex and age were found among cases and controls from both stages, although results remained significant after including these variables in the models. Finally, in addition to the difficulties inherent to the highly polymorphic HLA region, because of technical limitations (e.g., array content and imputation algorithm), we did not assess all common classical HLA alleles of this population, which could have masked additional asthma associations.

Conclusions

A two-stage association study targeting the genomic regions with an excess of African ancestry in the Canary Islands population revealed a novel classical HLA allele (HLA-DQA1*01:02) associated with asthma protection. This suggests that not all shared genetic risks between asthma and autoimmune diseases have opposite directions of effect47. Further studies will be needed to validate these results in other populations, as well as to assess the potential of HLA-DQA1*01:02 as an asthma biomarker.

Methods

Study design and participants

The study was performed in accordance with The Code of Ethics of the World Medical Association (Declaration of Helsinki) and approved by the Research Ethics Committee from the Hospital Universitario Nuestra Señora de Candelaria. Written informed consents were collected from all subjects or their representatives.

We performed a two-stage case–control study of asthma susceptibility targeting the five genomic regions that showed an excess of African ancestry in the current population of the Canary Islands (chr2:133,952,040–144,266,489; chr3:10,539,482–11,710,471; chr3:177,443,968–178,679,751; chr6:24,703,442–36,288,651; and chr13:57,962,413–70,091,195)22. The stage 1 was performed to prioritize genetic variants within these regions, while the stage 2 allowed us to validate these associations in an independent data set. A meta-analysis combining stage 1 and stage 2 association results was performed to establish the genetic variants associated with asthma in this study.

All case and control individuals declared at least two generations of ancestors born in the Canary Islands (Spain). The stage 1 consisted of 314 cases of asthma and 674 controls, while the stage 2 comprised a total of 278 asthma patients and 349 controls. All asthma patients were part of the Genetics of Asthma study in the Spanish population (GOA) study48 and were diagnosed according to the Global Initiative for Asthma (GINA) guidelines49. Population controls were obtained from the Cardiovascular, Diabetes, and Cancer (CDC) cohort study from the Canary Islands50. DNA was extracted from peripheral blood following column-based methods. Further details of these studies can be found elsewhere16,22,51.

Genotyping and statistical analyses

All DNA samples were subject to genotyping of 587,352 SNPs using the Axiom Genome-Wide Human CEU 1 Array (Thermo Fisher Scientific, Waltham, MA, USA) by the Spanish Genotyping Center (CeGen). Variant calling was performed separately in cases and controls from the discovery study and low-quality SNPs and samples were excluded according to the manufacturer’s instructions. Stringent quality controls were subsequently conducted using R programming (v3.2.2)52 and PLINK v1.953. Genetic variants with genotype call rates (CR) < 95%, low minor allele frequency (MAF < 5%), or that deviated from Hardy Weinberg equilibrium (HWE, p < 1.0 × 10−6) were excluded. A total of 403,615 filtered high-quality SNPs were kept for downstream analyses. Additionally, we also excluded individuals with missing clinical information, sex mismatches between records and those inferred from the genotype data, CR < 95%, high degree of kinship with others included in the study (PIHAT > 0.2), or that constituted heterozygosity outliers. A principal component analysis (PCA) was performed using PLINK v1.953 to obtain the leading principal components (PCs) to correct for population stratification during association testing. The PCA was assessed for stages 1 and 2, separately, and individuals were projected on data from The 1000 Genomes Project54 (Fig. 2).

Figure 2
figure 2

Plot of the first two principal components (explaining 78.95% of variability) of the individuals analysed in stage 1 and 2, projected on data of African (AFR), East Asian (EAS), European (EUR), and South Asian (SAS) populations from The 1000 Genomes Project.

Variant imputation of chromosomes 2, 3, 6, and 13 was then conducted on filtered data using the Michigan Imputation Server55 selecting SHAPE-IT v2.r790 for chromosome phasing56 and European population data from the Haplotype Reference Consortium release 1.1 as reference panel57. Logistic regressions were performed with EPACTS v3.2.658 using a binary Wald test and assuming an additive inheritance model. We included the first four PCs as covariates. Variants with MAF < 1% and a poor imputation quality (Rsq < 0.3) were excluded from the analysis. The genomic inflation factor (λ) of the results was calculated with the R package “qqman”59 (Supplementary Figure S5).

Genotyping and statistical analyses of the data from the two stages followed the same procedures. To assess the overall effect size of associated SNPs across the two stages, a meta-analysis was conducted using METASOFT v2.0.1, where effect heterogeneity was assessed with the Cochran's Q test significance60. Only variants showing the same direction of effects on both studies were considered. We estimated the effective number of independent tests with the Genetic type 1 error calculator (GEC)61 and established the meta-analysed significance at p = 1.20 × 10–6 based on Bonferroni correction.

Fine mapping of the HLA region

A more detailed assessment of the HLA region, residing in chromosome 6, was performed on both stages separately. Classical HLA alleles from three class I genes (-A, -B, -C) and four class II genes (-DPB1, -DQA1, -DQB1, -DRB1) were imputed with HLA Genotype Imputation with Attribute Bagging (HIBAG) v1.4 using imputation models adapted to the Axiom Genome-Wide Human CEU 1 Array and a European reference panel27. The association analysis with asthma was also performed with HIBAG using an additive model on those individuals with a high confidence score (probability threshold > 0.5). The four first PCs were included in the models. A meta-analysis was conducted with METASOFT v2.0.160 for the common alleles (MAF ≥ 1%) that showed a nominal association (p < 0.05) and the same direction of effects on both stages. For this study, a meta-analysed significance threshold was declared at p = 4.50 × 10–4 after Bonferroni correction based on the number of alleles tested. A post-hoc sensitivity analysis was also performed to address potential biases of the results including sex, age, and local ancestry estimates as covariables in the HIBAG association models. The estimation of local ancestry was carried out using ELAI62 assuming a three-way admixture as detailed elsewhere (European, NAF, and SSA)16,22. HLA haplotype analyses were performed using BIGDAWG63, focusing on those genes that were in the vicinity of the variants that showed significant association. Statistical testing of the haplotype difference was based on a Chi-squared test, using BIGDAWG’s default parameters.

Functional annotation of variants and gene expression

We used HIBAG to convert the associated P-coded HLA alleles to amino acid sequences, in order to reveal amino acid residues and, hence, those SNPs that may explain the classical HLA allele associations. We explored the potential biological consequences of the SNP predicting the altered codon in the amino acid in the classical HLA allele and its best proxies (i.e., in strong LD in Europeans, r2 > 0.7) by using different in silico tools. Variant prioritisation was based on results obtained with DSNetwork64, RegulomeDB65, and VEP66. Additionally, we assessed the potential regulatory role of the variants using HaploReg v4.167 and RegulomeDB, as well as the existence of long-distance physical interactions with Capture Hi-C Plotter68. We also accessed GTEx69, ExSNP70, and SNPdelScore71 to evaluate tissue-specific local expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs). Further details are provided in the Supplement.

In parallel, we accessed the results of public gene expression studies of asthma available in Gene Expression Omnibus (GEO). Differential gene expression between cases with asthma and healthy controls was examined for those genes in the vicinity of the SNPs and classical HLA alleles significantly associated with asthma susceptibility in our study, to provide additional information on the role of these genes in asthma physiopathology. First, we accessed transcriptomic data of bronchial brushing samples from 27 healthy controls and 128 individuals with asthma (72 non-severe asthma, 56 severe asthma) (GSE63142)72. Then, we accessed gene expression results of BAL samples from 12 healthy controls and 74 asthmatic individuals (28 non-severe asthma, 46 severe asthma) (GSE74986)73. The differential expression was assessed with two-sample t-tests and one-way analyses of variance (ANOVA). Further details are provided in the Supplement.