Polymorphic Alu insertions and the genetic structure of Iberian Basques
- First Online:
- Cite this article as:
- García-Obregón, S., Alfonso-Sánchez, M.A., Pérez-Miranda, A.M. et al. J Hum Genet (2007) 52: 317. doi:10.1007/s10038-007-0114-9
- 30 Views
Eight Alu sequences (ACE, TPA25, PV92, APO, FXIIIB, D1, A25 and B65) were analyzed in two samples from Navarre and Guipúzcoa provinces (Basque Country, Spain). Alu data for other European, Caucasus and North African populations were compiled from the literature for comparison purposes to assess the genetic relationships of the Basques in a broader geographic context. Results of both MDS plot and AMOVA revealed spatial heterogeneity among these three population clusters clearly defined by geography. On the contrary, no substantial genetic heterogeneity was found between the Basque samples, or between Basques and other Europeans (excluding Caucasus populations). Moreover, the genetic information obtained from Alu data conflicts with hypotheses linking the origin of Basques with populations from North Africa (Berbers) or from the Caucasus region (Georgia). In order to explain the reduced genetic heterogeneity detected by Alu insertions among Basque subpopulations, values of the Wright’s FST statistic were estimated for both Alu markers and a set of short tandem repeats (STRs) in terms of two geographical scales: (1) the Basque Country, (2) Europe (including Basques). In the Basque area, estimates of Wahlund’s effect for both genetic markers showed no statistical difference between Basque subpopulations. However, when this analysis was performed on a European scale, FST values were significantly higher for Alu insertions than for STR alleles. From these results, we suggest that the spatial heterogeneity of the Basque gene pool identified in previous polymorphism studies is relatively recent and probably caused by a differential process of genetic admixture with non-Basque neighboring populations modulated by the effect of a linguistic barrier to random mating.
KeywordsAlu insertionsGenetic heterogeneityGene flowWright’s FSTBasques
The systematic study of different types of molecular polymorphisms in a human population can provide detailed information on the demographic and micro-evolutionary processes that have affected it over time. Such polymorphisms include a number of DNA markers located in the non-recombining region of the Y-chromosome (NRY) with the potential to provide information on male-specific patterns of migration in the past, namely Y-chromosomal short tandem repeats and single nucleotide polymorphisms (Y-STRs, Y-SNPs), whereas analyses of the mitochondrial DNA (mtDNA) can assist in the clarification female-mediated migration episodes. Still other polymorphisms are useful for describing the joint evolution of the maternal and paternal lineages (Jorde et al. 1995; Lell and Wallace 2000; Richards 2003). At a different level, there are molecular markers that have changed little throughout evolution, such as Alu elements and SNPs, whereas other markers stand out as having high mutation rates, such as STRs (Jorde et al. 1997; Watkins et al. 2001, 2003). The more conservative of these polymorphisms are considered to be suitable markers for studying population phylogenetics due to their abundance in the genome and their low mutation rates. The less conservative, hypervariable DNA markers reveal data on the most recent micro-evolutionary changes undergone by a given population.
Alu elements represent around 10% of the human genome, with around 1,400,000 copies distributed throughout. These insertions are part of the most abundant short interspersed elements (SINEs) in the human genome, and they are defined as sequences of approximately 300 base pairs (bp) in length ancestrally originating from the 7SL RNA gene by retro-transposition (reviewed in Batzer and Deininger 2002). Some interesting characteristics of the Alu insertions make them valuable markers in phylogenetic analyses. The main feature of Alu polymorphism is stability; because alleles are identical by descent, it is highly unlikely that the same Alu insertion could occur more than once independently at the same locus. This means that polymorphic Alu insertions reflect, in general unique evolutionary events. Furthermore, the absence of the insertion facilitates a knowledge of the ancestral state, which is an advantageous attribute in investigations focused on increasingly disentangling the demographic, genetic and evolutionary history of the human species (Batzer et al. 1994).
It is possible that the study of Alu insertions would shed some light on human evolutionary genetic processes in the European and Mediterranean context that are currently being debated among specialists. These include (1) the origin of Basques and the genetic heterogeneity of the present-day Basque population; (2) the impact of North African migrations on the genetic background of the Iberian populations; (3) the peopling of Europe. The choice of polymorphisms with a suitable level of resolution and their analysis in a large enough series of representative samples from different geographic regions should provide interesting data to help clarifying these questions. Therefore, one of the main aims of this study, which is a continuation of previous investigations carried out by the authors (de Pancorbo et al. 2001; Peña et al. 2002; Pérez-Miranda et al. 2003, 2004, 2005; Alfonso-Sánchez et al. 2006; García-Obregón et al. 2006) was to create a robust genetic database which may help in the gradual reconstruction of the peopling processes of Europe and the Mediterranean Basin.
In this study, we have genetically characterized two samples of Basques based on eight polymorphic Alu insertions (ACE, TPA25, PV92, APO, FXIIIB, D1, A25 and B65), which were selected bearing in mind the availability of databases for comparisons. Samples were collected from the autochthonous population settled in Navarre and Guipúzcoa provinces, both of which are located in the historical Basque territory (Northern Spain). These areas were selected for two basic reasons: (1) Northern Navarre and Guipúzcoa are the two Basque regions where the Basque language (Euskera) has traditionally been more deep-rooted (Alfonso-Sánchez et al. 2005); (2) Navarre is the only Basque region for which Alu insertion frequencies are not available. The findings on Alu diversity in these native Basque populations were then examined in a broader geographic context. To that end, Alu data for European, Caucasus and North African populations were compiled from the literature. With this integrative approach we sought to analyze the degree of genetic heterogeneity among geographical subpopulations of the Basque Country as well as to assess the genetic relationships of the autochthonous Basques with North African and Caucasus populations, since both of the latter population groups have been linked to the origin of Basques in some previous works (Arnaiz-Villena et al. 1997; Calderón et al. 1998). In addition, Alu data presented herein may be useful in the interpretation of gene flow processes that have contributed to the shaping of the European gene pool.
Material and methods
The Basque area is located at the western end of the Pyrenees on the Bay of Biscay area of the Atlantic Coast, astride the border between France and Spain. For further details on the geographic, demographic and linguistic characteristics of the study area (Guipúzcoa and Navarre provinces), the reader is referred to previous publications (Pérez-Miranda et al. 2003, 2005).
Whole blood samples were collected in EDTA vacutainer tubes by venipuncture from unrelated healthy individuals living in Guipúzcoa province (n = 94) and in the northern fringe of Navarre province (n = 109). Only autochthonous (native) Basque individuals were included in the present analysis. In this study, Basque surnames and birthplaces of individuals and ancestors (recorded back to the third generation) were the criteria employed to define local autochthony; therefore, all donors were interviewed to obtain information on the geographical origins of their parents and grandparents. Ethical guidelines for research with humans were adhered to as stipulated by the Ethical Committee of the University of the Basque Country. All blood donors gave their informed consent prior to inclusion in the sample.
Genomic DNA was extracted from peripheral blood using the standard phenol-chloroform procedure and stored at −20°C. Eight autosomal Alu insertions (ACE, TPA25, PV92, APO, FXIIIB, D1, A25 and B65) were genotyped in both samples. Additional information on the PCR amplification conditions and agarose gel electrophoresis can be found in García-Obregón et al. (2006).
Allelic frequencies for the eight Alu loci typed in the Navarre and Guipúzcoa collections were calculated by direct counting. Gene diversity (GD) was computed using the Power Marker v. 3.0 program (Liu and Muse 2004). To test for Hardy-Weinberg equilibrium (HWE), we carried out a Fisher’s exact probability test to estimate P values (Guo and Thompson 1992) using Arlequin v. 3.0 (Excoffier et al. 2005).
Finally, with the aim of interpreting the genetic heterogeneity observed among Basque subpopulations and assessing the Wahlund’s effect, Wright’s FST statistic values were estimated from both Alu insertions and a series of forensic STRs. To that end, allelic frequencies for several European and North-African samples compiled in a previous publication (Pérez-Miranda et al. 2005) were considered. The statistical significance of the differences in Wright’s FST values obtained for both types of genetic markers was then assessed using the Mann-Whitney nonparametric U test.
Alu insertion frequencies with their standard errors (±SE) and gene diversity (GD) in Basque samples from Guipúzcoa and Navarre provinces (Spain)
0.425 ± 0.033
0.320 ± 0.027
0.553 ± 0.034
0.592 ± 0.034
0.239 ± 0.0387b
0.154 ± 0.026
0.950 ± 0.017
0.970 ± 0.011
0.441 ± 0.036
0.490 ± 0.034
0.473 ± 0.034
0.372 ± 0.032
0.156 ± 0.025
0.124 ± 0.021
0.557 ± 0.032
0.466 ± 0.035
Alu insertions are classed as biallelic markers according to the origin of their polymorphism. Interestingly, the analysis of the PV92 locus revealed three heterozygotic individuals for a third allele in the sample from Guipúzcoa province. This variant consisted of a second Alu insertion in an existing Alu element. To date, this uncommon allele has been described only in Basque and North African populations (Comas et al. 2001).
The degree of genetic variability in the Basque samples was assessed by computing the GD for each locus (see Table 1). As expected, the lowest figures of GD were observed in the Alu loci closest to fixation (APO), both in Guipúzcoa (0.094) and in Navarre (0.058). On the contrary, the highest GD values were found in those Alu markers showing insertion frequencies of around 0.50. These were the cases of D1 in Guipúzcoa (0.498) and of FXIIIB in Navarre (0.500).
For a thorough analysis of the genetic structure of Basques, data on Alu insertions from previous publications were compiled. This database included a sample of native Basques with non-specified origin, from now on referred to as “general” Basques (Comas et al. 2000). We also included the autochthonous samples studied in the article by de Pancorbo et al. (2001), namely Basques from Biscay, Alava and Guipúzcoa, in addition to a sample from the resident population of the Basque Country (not all individuals are, necessarily, native people). Because Alu markers A25 and B65 were not genotyped in most of the cited samples, these two insertions were not considered in our analysis. Although the exact test of population differentiation (data not shown) failed to detect significant genetic heterogeneity for the whole set of Basque samples, some significant differences were found for several Alu loci in the following pairs of populations: Alava and Navarre (ACE, P = 0.036), Guipúzcoa (this study) and Biscay (D1, P = 0.007) and Navarre and the resident population (APO, P = 0.035).
Firstly, the Caucasus populations can be seen to display a very high genetic heterogeneity despite their limited geographical distribution, as can be inferred from their remarkable dispersion along both dimension I and dimension II of the MDS plot. In particular, there were conspicuous separations between the plots of the collections from Kabardinia (KABR), Azerbaijan (AZER) and Armenia (ARMN); these collections were also markedly distant from the rest of the populations included in the analysis. On the other hand, Cherkessians (CHER) and Georgians (GEOR) grouped closer to the European cluster. The North African cluster seems to show only a moderate within-group genetic heterogeneity. These samples grouped mostly in the upper-right quadrant delimited by the positive segments of both dimension I and II and visibly segregated from the Caucasus and European populations. A third major grouping consists of the European populations not located in the Caucasus; this grouping is concentrated very close to the centroid of the bidimensional representation and constitutes a very homogeneous cluster from a genetic standpoint. Two populations are worth highlighting within this group: Albania (ALBN), with Alu insertion frequencies similar than those estimated in GEOR, and the Canary Isles (CNRI), whose position is in agreement with historical origins and with a geography positioned in an interim point between European and North African samples. More specifically, the Basque samples were dispersed within the European cluster, staying distant from both the Caucasus and North African populations.
Results of the analysis of molecular variance (FST, FSC and FCT) for eight Alu loci three different population clustersa classified according to geography
In human evolutionary studies, paternal (Y-chromosome) and maternal (mtDNA) lineages have been extensively analyzed to characterize human groups in terms of origin, demographic milestones with genetic implications (founder effects, bottlenecks) and genetic variability. However, the reconstruction of the history of human populations from genetic data is a complex task that also requires information from the recombining parts of the nuclear DNA, namely, the autosomes (Kidd et al. 2000). Polymorphic Alu insertions represent an important source of nuclear genetic diversity. Alu elements are highly stable markers that are not affected by substantial mutation rates. Consequently, Alu repeats form a rich molecular fossil record that is faithfully recorded in the human genome from generation to generation, thereby representing an advantageous characteristic in phylogenetic studies. In addition, Alu elements are widely dispersed throughout the human genome, subject to extremely limited amounts of gene conversion, and selectively neutral (Batzer et al. 1996; Comas et al. 2000). In terms of selection, Cordaux et al. (2006) have suggested that most young Alu elements can be considered to be neutral residents of the human genome. With mutation and selection ruled out, the analysis of Alu insertions should solely reflect processes of interaction between gene flow and genetic drift in the past.
The genetic structure emerging from MDS and AMOVA analyses indicates a substantial heterogeneity among the populations considered, which is basically distributed between three different geographic areas: the Caucasus, the rest of Europe and North Africa. It is worth underscoring the presence of a significant Alu diversity among the Caucasus populations. The high degree of genetic heterogeneity observed in the Caucasus region is probably caused by effect of genetic drift, which is promoted by the small population sizes and the relative isolation resulting from the complex orography of this geographic zone, as has been suggested in previous works (Nasidze et al. 2001; Alfonso-Sánchez et al. 2006). This reasoning is also a valid explanation for the differentiation between Caucasus and European populations.
In the MDS plot, the samples representing the Basque Country appear to be clearly distanced from the Caucasus populations. This result conflicts with previous findings regarding the variability of the immunoglobulin (GM and KM) genes, whose authors enunciated a hypothesis linking the origin of Basques with a small Neolithic North Caucasian population (Calderón et al. 1998). Within this global picture of Alu heterogeneity between Basque samples and populations from the Caucasus region, particular interest should be given to the scanty genetic affinity between the Basques and the Georgians. Both of these human groups are considered to be linguistic isolates. It has been suggested that Euskera (Basque language) and Kartvelian (Swanetian language) are the remnants of pre-Indo-European languages of Paleolithic antiquity (Renfrew 1991). A recent linguistic classification has the Euskera sharing the same cluster with the Caucasian languages (Chen et al. 1995). Based mainly on such linguistic criteria, some experts have often postulated that Basques and Georgians have a common Upper Paleolithic background (Gamkrelidze and Ivanov 1990; Ruhlen 1991). However, Alu markers examined in the present study failed to detect close genetic affinity between these groups, as can be inferred from their positions in the two-dimensional representation. It can be concluded, therefore, that Alu data do not support the putative relationship between language and genes in Basques and Georgians. This finding is compatible with postulates of several previous studies, based either on classical markers (Bertorelle et al. 1995) or on DNA molecular markers (Alfonso-Sánchez et al. 2006). Likewise, the genetic information derived from the analysis of Alu elements does not sustain the notion of a common origin between Basques and North Africans (Berbers), as has been proposed in earlier population genetic surveys (see Arnaiz-Villena et al. 1997). In fact, the samples with Berber origins (North Morocco and Southeastern Morocco) appear to be interspersed within the North African cluster and clearly segregated from the Basques.
The genetic heterogeneity found between North Africa and Europe based on Alu diversity is consistent with results reported in previous studies using classical genetic markers (Bosch et al. 1997; Simoni et al. 1999), microsatellites or STRs (Bosch et al. 2000), Y-chromosome STRs (Bosch et al. 1999; Manni et al. 2002; Semino et al. 2004), polymorphic Alu insertions (Comas et al. 2000, García-Obregón et al. 2006) and HLA-class II loci (Pérez-Miranda et al. 2003, 2004). Several investigations claim that the genetic discontinuity among populations from the two Mediterranean shores could be related to a strong barrier to gene flow between Africa and Europe at the Strait of Gibraltar (Simoni et al. 1999; Comas et al. 2000; Manni et al. 2002). It has also been argued that a Mesolithic (or older) in situ differentiation of the human groups established in northwestern Africa accounts for the abovementioned genetic differentiation (Bosch et al. 1997).
With respect to the group of European populations (excluding the Caucasus region), no clear geographic structuring of the Alu diversity was observed; therefore, the notion of heterogeneity due to isolation-by-distance can be ruled out. The Basque collections included in the MDS analysis remain grouped with the vast majority of European populations. Both the MDS and the AMOVA analyses based on Alu data denote a lack of significant genetic heterogeneity, both among Basque subpopulations and among autochthonous Basques and other neighboring populations. However, there is a sizeable number of publications which report that the Basques have gene frequencies that are notably different from the expected ones for a population theoretically inserted—according to geographic location—within the European genetic landscape. In such studies, a wide range of genetic markers have been analyzed, including blood group systems ABO (Boyd and Boyd 1937), Rh (Etcheverry 1945) and Duffy (Levine et al. 1977), some Y-chromosome DNA haplotypes (Lucotte and Hazout 1996), mtDNA haplogroups (Torroni et al. 1998, 2001), immunoglobulin allotypes (Calderón et al. 1998), HLA class-II genes (Pérez-Miranda et al. 2003, 2004) and autosomal STRs (Pérez-Miranda et al. 2005), among others. Specifically within the scope of the Basque area, the bulk of the population genetic studies identify spatial substructuring of the Basque gene pool (Goedde et al. 1972, 1973; Aguirre et al. 1991; Calderón et al. 1998; Pérez-Miranda et al. 2003, 2005, among others), although this topic has been questioned in a few publications (Calafell and Bertranpetit 1994; Comas et al. 1998).
In an attempt to analyze the discrepancies in the degree of genetic heterogeneity detected by different types of polymorphism in the Basque gene pool, Wright’s FST values computed from both Alu insertions and from other genetic loci not submitted to selective pressures (autosomal STRs) were compared in two different databases: (1) only subpopulations of the Iberian Basque Country, (2) other European samples in addition of the Basque subpopulations. At the continental scale, Alu insertions showed a genetic heterogeneity visibly higher than the obtained one for autosomal STRs, whereas in the context of the Basque area the values of Wahlund’s variance were similar for both types of polymorphic markers.
When we consider a broad geographic region inhabited by a group of populations with a relatively ancient common ancestor, Alu insertions are expected to have higher values of Wright’s FST than the autosomal STRs. The microsatellites are characterized by relatively high mutation rates that oscillate between 3.3 × 10−4 per locus per generation (25 years in a generation) (Forster et al. 2000) and 15.2 × 10−4 per locus per generation (Zhivotovsky et al. 2004). The combined action of the high mutation rate of human microsatellites with the homogenizing effect typical of gene flow may eliminate or dilute the genetic information on the origins of a given population by blurring any signal of phylogenetic affinity with ancient human groups. On the contrary, because Alu insertions are unique events and, consequently, much more conservative, they are not exposed to the random fluctuations caused by mutation action. Therefore, Alu elements will better reflect the common ancestral origin of the populations of a given geographic region. Within the context of the Basque area, polymorphic Alu insertions showed the same level of genetic heterogeneity as the autosomal STRs. For this reason, we suggest that the spatial substructuring detected in the Basque gene pool from the analysis of diverse genetic markers (see references above) could have had a relatively recent origin. In effect, if the population subdivision unveiled by a group of polymorphic markers had begun to take place at some stage near to the origin of the Basque population, this fact should be reflected in Wright’s FST have greater values for Alu insertions than for microsatellites. The observable differences for some gene frequencies between the provinces of the Basque Country probably stem from recent admixture processes with surrounding populations; these processes would occur in variable degrees for each Basque region according to the predominance of the Basque language.
The Basque population is a genetic outlier in Europe. As we have seen above, many studies have exposed the singularities of the genetic background of the autochthonous Basque population based on both classical genetic markers and DNA molecular markers (reviews in de Pancorbo et al. 2001; Pérez-Miranda et al. 2005). In explaining the genetic distinctiveness of Basques, the most common scenario is random genetic drift and high inbreeding levels over long periods in association with an isolation from surrounding populations. Because Basques represent a linguistic island within the Iberian Peninsula, some authors have suggested that the genetic isolation of the Basques may be a consequence of their peculiar language (Cavalli-Sforza et al. 1994; Calderón et al. 1998; de Pancorbo et al. 2001). Language is considered one of the major sociocultural factors restricting gene flow and population admixture in that it prevents the assimilation of immigrants into the native (recipient) population and increases ethnic endogamy, whose main consequence would be the departure of panmixia (Alfonso-Sánchez et al. 2001, 2005). Indeed, it is well documented that linguistic boundaries can be effective barriers to gene flow (Barbujani and Sokal 1990; Barbujani 1997). This seems to be the reason why Guipúzcoa province, which does not border on other non-Basque (Romance-speaking) regions, often appears as the most differentiated Basque subpopulation in comparison to other Iberian and even Basque populations (see Fig. 1), as has been confirmed in different previous works (Calderón et al. 1998; de Pancorbo et al. 2001; Pérez-Miranda et al. 2005).
In summary, our analysis of eight Alu insertions revealed no significant genetic heterogeneity between Basque subpopulations or between Basques and other European populations. Based on a comparison of the genetic structure of the Basque subpopulations obtained from Alu insertions and autosomal STRs, it can be inferred that distinct polymorphic markers will reveal distinct domains of the evolutionary history of human populations, depending on the particular characteristics of each genetic loci in terms of level of polymorphism, mutation rate, potential action of back mutation, etc. The genetic heterogeneity between Basque subpopulations identified in several previous polymorphism studies seems to be relatively recent and caused by a differential process of genetic admixture with non-Basque neighboring populations, modulated by the effect of a linguistic barrier to random mating (panmixia). Although it cannot be stated with absolute certainty that the origin of the Basques is recent and that its gene pool is modeled solely by interplay between the isolation resulting from the linguistic barrier to panmictic matings and the effect of genetic drift, our analyses do support this hypothesis as the most plausible, bearing in mind the manifest contradiction of the Alu data presented herein and the hypotheses linking the origin of Basques with the North of Africa or the Caucasus region. It would be interesting to extend the number of Alu insertions studied in order to complement the genetic information provided by analyses of maternal (mtDNA) and paternal (Y-chromosome) lineages and reveal new clues about the origin and past demographic interactions of the Basque population.
This work was funded by the Ministerio de Ciencia y Tecnología (Spain), Grant BOS2002-01677, and by the Universidad del País Vasco/Euskal Herriko Unibertsitatea, Grant GIU 05/51. S. García-Obregón was supported by the ‘Programa de Formación de Investigadores’, Departamento de Educación, Universidades e Investigación (Basque Government). We are particularly grateful to all voluntary donors who cooperated generously to the development of this study.