Background

The Azores is a Portuguese archipelago composed of nine islands distributed by three geographical groups: the Eastern (São Miguel and Santa Maria), the Central (Terceira, Pico, Faial, São Jorge and Graciosa) and the Western (Flores and Corvo). The Portuguese explorers, who discovered the archipelago in 1427, only started the settlement in 1439 through a long and difficult process. Historical data report a contribution from people with genetic backgrounds other than Portuguese, including Flemish, Spanish, French, Italian, German, Scottish, Jewish, and also from Moorish prisoners and black slaves from Guinea, Cape Verde and São Tomé [1]. São Miguel is the largest island of the Azores and is composed of 131,609 inhabitants (2001 Census, Portugal National Institute of Statistics). Several studies have been performed to characterize the genetic pool of the Azoreans [210]. These studies report a high genetic variability and heterogeneity of the Azorean population, explained by the settling history of the islands, where a major contribution of mainland Portugal individuals is evident. Moreover, the data revealed absence of population structure, even though the archipelago's geographical discontinuity and demographic disproportionality. Currently, this knowledge is being fundamental for the design and development of pharmacogenetic research and genetic studies in common diseases, such as cardiovascular and autoimmune diseases.

The human leukocyte antigen (HLA) genes, a central component of the major histocompatibility complex (MHC) on 6p21.3, encode polymorphic class I, II and III molecules that play a major role in the immune response [11]. In addition, HLA loci are characterized by high levels of polymorphism and linkage disequilibrium (LD), important characteristics to study the genetic background of human populations, as well as their present-day genetic structure. Here, we analyse the allele frequency and LD extent of HLA class I and II, in order to identify its diversity and haplotype distribution and to gain further insight in the potential use of this genomic region for the study of autoimmune diseases in the São Miguel Island population.

Materials and methods

Population samples, genotyping and statistical analysis

The sample set was composed of 106 healthy blood donors living in São Miguel Island obtained from the anonymized Azorean DNA bank located at the Hospital of Divino Espírito Santo of Ponta Delgada, EPE, the main hospital in Azores [12]. HLA class I (-A, -Cw and -B) and class II (-DRB1, -DQB1, -DPA1 and -DPB1) genotyping was performed by PCR-SSP Olerup SSP™ (GenoVision Inc.), according to the manufacturer's instructions. After electrophoresis on a 4% agarose gel stained with SYBR® Green, PCR products were visualized, followed by HLA allele identification using the Helmberg-SCORE™ software version 3.320T (Olerup SSP AB, Saltsjöbaden, Sweden).

Average gene diversity and estimation of the HLA haplotypes was carried out using Arlequin v3.0 [13]. Evaluation of standardized multiallelic disequilibrium coefficient, D', was performed by the Haploxt application from the GOLD software. Average D' values were calculated by a simple mathematical mean of all values obtained for each marker pair. Nei's FST genetic distance matrix was computed between pairs of populations by DISPAN [14] and used to construct a Neighbor-Joining (NJ) tree by PHYLIP 3.63 [15]. We employed TreeView 1.6.6 [16] to display tree phylogenies obtained from NJ. In order to obtain the best results concerning population comparisons a compromise between the number of populations and HLA loci was performed. Consequently, HLA-DPA1 and -DPB1 were excluded from analysis. FST values were based on allele frequencies obtained in an online database (HLA-Allele Project; http://www.allelefrequencies.net/), in 19 populations for 5 HLA loci: São Miguel, Terceira, Italy, France, Germany, Belgium, Turkey, Morocco, Japan, Mongolia Oold, Mongolia Tsaatan, Mongolia Khalkha, Basque, Ibiza, Majorca, Majorca Jewish, Chuetas, Minorca and Jordania. Along with FST values, 5 loci haplotypes were searched in the same database to further investigate the possible origins of the early settlers.

Results

The analysis of the HLA alleles in the São Miguel Island population (Table 1) revealed for the HLA-A locus a total of 16 different alleles, 13 HLA-Cw and 24 HLA-B alleles. Regarding HLA class II loci, we found 22 HLA-DPB1, 13 HLA-DRB1, 5 HLA-DQB1 and 6 HLA-DPA1 different alleles. HLA-B and HLA-DPB1 are the two loci with the highest numbers of alleles, suggesting higher diversity for these markers. The highest frequency observed, 0.462, was in HLA-DPA1 gene, which shows a low number of alleles. In contrast, the lowest frequency identified (0.5%) was present in HLA-A, -B and -DPB1 (Table 1). Genetic diversity values ranged from 0.821 both for HLA-DPA1 and -DQB1 to 0.934 for HLA-B, with a mean value of 0.846 (Table 2). Overall, HLA allele frequencies in São Miguel, mainland Portugal and other European populations demonstrated absence of statistically significant differences (GST = 0.03; data not shown). According to Wright [17] values of GST smaller than 0.05 indicate little genetic differentiation.

Table 1 HLA class I and II allele frequencies in São Miguel population (the highest values are in bold).
Table 2 Gene diversity (GD) and linkage disequilibrium (D') values for 7 HLA loci in São Miguel Island population.

Considering the 7 HLA loci, haplotype determination demonstrates a total of 176 different haplotypes corresponding to an 83.0% discriminatory power. Analysis of 5 HLA-A-Cw-B -DRB1-DQB1 haplotypes was also performed (see Additional file 1 for details). The results indicate that A*01-Cw*07-B*08-DRB1*03-DQB1*02 is the most frequent in São Miguel (7.9%), followed by A*24-Cw*07-B*08-DRB1*03-DQB1*02 (3.8%). Both A*02-Cw*05-B*44-DRB1*04-DQB1*03 and A*29-Cw*16-B*44-DRB1*07-DQB1*02 are present at a frequency of 1.9%. A total of 157 haplotypes were matched against worldwide populations (HLA-Allele Project; http://www.allelefrequencies.net/). The results showed that the second most frequent haplotype, above described, appears only on Tunisia. Moreover, just 9 haplotypes (Haplotype number - HN - 1, 29, 37, 42, 84, 85, 101, 104 and 112; see Additional file 1 for details) were identified in this database.

Linkage disequilibrium was based on the calculation of standardized multiallelic disequilibrium coefficient, D'. The range values are 0.163 for HLA markers DPA1-DQB1 and 0.712 for DQB1-DRB1 (Table 2). This wide variation averages 0.285 for the 7 loci. Curiously, the genetically closest markers (DPA1-DPB1, 0.011 Mb; D' = 0.398) do not present the highest value of D' (DQB1-DRB1, 0.081 Mb; D' = 0.712). A poor correlation between distance (Mb) and D' is observed, although there is a decrease of LD values over physical distance increase, as expected.

In order to obtain a graphical view of the genetic similarity between São Miguel (106 individuals, 5 HLA loci) and other populations, we computed Nei's genetic distances and depicted them in Figure 1. Interestingly, São Miguel is closer to Morocco population than to Terceira, another Azorean island. Nevertheless, in general, both populations cluster within the Europeans.

Figure 1
figure 1

Neighbor-Joining tree comparing 5 HLA loci in 19 populations. São Miguel clusters with Europeans and Moroccans.

Discussion

Extensive studies have been performed in several geographical areas to characterize the diversity of HLA genetic markers. These evaluations allow a better knowledge of the population structure considering non-neutral markers, as well as an understanding of the influence of evolutionary processes in the overall signature of a population. These genetic data are crucial for the comprehension of the molecular ethiology and epidemiology of common diseases. In general, the data here presented corroborate previous works [3, 610], where Azoreans including São Miguel islanders show high values of genetic diversity when compared to mainland Portugal and other European populations. This may be a direct consequence of the Azorean settlement, with a major contribution of mainland Portuguese (~60%) and, to a lesser extent, Flemish, Spanish, French, Italians, Germans, Scottish, Jews, Moors and blacks from Guinea, Cabo Verde and São Tomé. Previous studies of HLA markers in mainland Portugal (3 loci, -A, -B and -DRB1, [18]) and in Azores (6 loci, -A, -Cw, -B, -DRB1, -DQA1 and -DQB1, [5]) demonstrate values of average diversity of 0.92 in both populations. The results obtained in the present study, based in 7 loci, showed a smaller value (0.84). This may be explained by the fact that Spinola et al. [5] used a high-resolution methodology to genotype HLA. Because alleles A*0101 and A*0102 are not considered the same allele (A*01), this methodology allows the identification of a higher number of different alleles. Nonetheless, the data show no significant differences between allele frequencies in São Miguel and Terceira islands. Considering HLA alleles distribution, the presence of -A*30 and -A*80, commonly found in sub-Saharan populations [1921], in São Miguel validates historic records of slave settlers. In addition, the presence of alleles -B*35, -B*57 and -B*15 suggest a direct contribution of Moorish prisoners in Azores [2224]. Nevertheless, the influence of early Portuguese settlers can not be ruled out since allele frequencies are similar. In general, these results are corroborated by the NJ tree (Figure 1), where São Miguel shows influence of both African and European populations.

Linkage disequilibrium is considered a good measure of population structure. According to Sanchez-Mazas [25] HLA-DPB1, located on the centromeric side of the HLA chromosomal region, does not show high values of LD with the other HLA loci. Interestingly, in the present study, the lowest values of D' observed are related with this marker. This result is explained by the high recombination region involving one or several hotspots, which separates HLA-DPB1 from the rest of the other HLA loci. Abecasis et al. [26] discuss that a value of D' = 0.33, which corresponds to a 10-fold increase in the required sample size, is commonly taken as the minimum usable amount of LD. Considering the 21 possible HLA loci combinations, 17 demonstrated values inferior to 0.33, and only 2 (Cw-B and DQB1-DRB1) showed values significantly higher (0.571 and 0.712, respectively). The HLA data reported by Meyer et al. [27] indicate a significant LD between all HLA loci in around 40 worldwide studied populations. The present research did not indicate large D' values and corroborates the results obtained by Service et al. [28] and Branco et al. [9, 10], where the Azoreans have the lowest values of LD when compared with isolated and outbred populations.

HLA diversity in human populations is an important aspect of disease epidemiology, especially autoimmune disorders, such as type I diabetes, ankylosing spondylitis and celiac disease. According to Bakker et al. [29], the association of HLA alleles and/or haplotypes with disease susceptibility may be confounded by the presence of population stratification in neighboring HLA and non-HLA genomic regions. The high variability of HLA, and the absence of genetic structure and extensive LD, here demonstrated, suggest that autoimmune diseases studies in São Miguel islanders will necessarily encompass a more focused analysis of HLA extended haplotypes, as well as the evaluation of other non-HLA candidate genes.