Background

Considering Basques as the direct descendants of Paleolithic Europeans has become a multidisciplinary premise. However, there is no irrefutable evidence for this supposition. The Basque country has a well represented archaeological record in Paleolithic and Mesolithic periods [1] but Archaeology can seldom differentiate an "in situ" cultural evolution from successive waves of new incomers. Basques speak a non Indo-European language with no close affinities with any other extant language but, even if their roots could be found, they would not reach the Paleolithic deepness due to the fast rate of change of languages. Classical population genetic studies, showed the Basques as one of the major outliers in Europe [2]. Nevertheless, these results can be explained by genetic drift which implies isolation but not necessarily an old history for that population. Lack of recombination and the fast mutation rate made mtDNA the ideal molecule to construct maternal genealogies, which frame in time and space the evolution and dispersion of human populations. However, until now, mtDNA studies on the Basques have only confirmed its low genetic diversity in a common Western Europe background [35]. It has been proposed, on the basis of their geographic distributions, that several mitochondrial lineages as V [6, 7], and the H1 and H3 subgroups [4, 5, 8] are markers of a Paleolithic human dispersal from southwestern Europe, including the Basque country, to Northeast Europe. However, diversities for these lineages are not higher in Basques than in Central Europeans. It could be possible that this lack of distinctness in Basques is real, in fact, even small levels of gene flow during enough time might have replaced the majority of their ancient lineages [9], but it could also be possible that this uniformity is due to a lack of resolution of the Basque mtDNA genealogies [10].

To deal with this possibility, we analyzed a sample of 211 unrelated Basques using the hypervariable segment of the mtDNA control region (HVSI-II) and diagnostic RFLP analysis, and sequenced the complete mitochondrial DNA of the rare lineages.

Results and discussion

The analysis of the Basque sample showed three haplotypes (CRS, 16342, and 16278 16311) that by their mutated positions in the control region had uncertain subhaplogroup adscription, but that by diagnostic RFLPs (+12308 Hinf I) belonged to haplogroup U/K. Also, we found one individual (16146 16189 16342) that belongs to the scarce U8a subhaplogroup. Complete sequencing of the four rare U haplotypes and their inclusion in a phylogenetic tree [11] with all published U complete sequences (data not shown) allowed their correct subhaplogroup affiliation. A more schematic tree (Fig.1) shows that one of the lineages (Bq24) belongs to subhaplogroup K1a1 [12], being a back-mutation of the diagnostic position 16224 its main peculiarity. Its most related K1a1 complete sequences are the eleven found by Herrnstadt et al. [13] and Finn 153 in Finnilä et al. [14]. Applying a mutation-rate of 1.26 × 10-8 [15] to their average sequence divergence [16] a radiation upper bound of 12 ± 4 Ky is obtained for this group. The other three lineages clustered into the rare subhaplogroup U8a [17]. This subhaplogroup can be RFLP diagnosed as -7055 Alu I. What is outstanding of these sequences is their great genetic diversity that extends the range of all known U8a European sequences (Fig.1). This Basque diversity specially contrasts with the lack of variation in the Finn sequences. Furthermore, the phylogenetic radiation of their U8a lineages (Fig. 1 and 2) is characteristic of an old population without recent exponential growth [18]. In fact, the most ancestral sequence (Bq1820) indicates that U8a lineages could have been in the Basque country since 28 ± 9 Ky, and that the other Basque lineages, belonging to the U8a1 subgroup (RFLP diagnosed as -3737 Hph I and + 5235 MspA1 I), participated in a more recent European expansion around 13 ± 5 Ky, similar to that estimated for K1a, and congruent with a re-expansion from an Iberian refuge when glaciers retreated in Europe proposed for other mtDNA clades [4, 5, 7]. Although all the U8a complete sequences belong to Europeans, the ancestral radiation of haplogroup U most probably occurred in western Asia shortly after the out of Africa episode [19], with early branch expansions to India (U2), Europe (U5) and North Africa (U6). U8 may be considered another main branch with a broad geographic range. Its first split separated U8a from U8b/K around 57 ± 11 Ky. Relatively short in time a new subdivision gave the sister clades U8b and K [17]. Until now there was only one completely sequenced U8b subject [20]. The addition of our Jordan 767 sequence to the tree (Fig. 1) gives a branching point for U8b, defined by transitions 6546, 6599 and 12771. Two of them, 6546 and 12771, can be MnlI-RFLP-detected. Curiously, the similar number of substitutions to the coalescence point of the three U8 branches U8a (5), U8b(5), K(6), suggests that all of them radiated at a similar age, supporting the hypothesis that, most probably, global climatic changes favored human expansions simultaneously at a continental scale [21]. When only RFLPs and/or partial sequence data are available, U8a haplotypes in general can be identified by the -7055 Alu I RFLP or the 73, 282 HVSII motif, and the majority of U8a1 derivates in particular by the16146, 16342 HVSI motif. In turn, U8b can be classified by the 16189, 16234 HVSI motif. From a total of 20,563 sequences studied 10,677 could be, unequivocally, U8a analyzed (see Additional file 1) and referenced (see Additional file 2). For the remaining (9,886), in general, only their U8a1 assignation was possible. Analysis of 19,133 Eurasian, and 1,430 North African published and unpublished RFLP/HVSI/HVSII sequences showed a scattered but widespread U8a/1 distribution that is restricted to Europe. Its frequency (See Additional file 1 for the European distribution of subhaplogroup U8a, and Additional file 2 for references) ranges from 0 in the majority of samples to 8% (although with a 95% coefficient range around 2.9–21.4) in the region of Var in Southeast France [22]. In spite of its moderate sample size an important characteristic of this region is its high U8a polymorphism as all the three lineages detected are different. The U8a eastern boundaries seem to be in the Volga region near the Urals [23]. U8b is also a quantitatively minor clade that partially overlaps with U8a in Europe. However, its presence in the Caucasus [24], Iran [17], the Near East [25], and North Africa [26], where U8a has not been detected, attests a more southern geographic distribution. The third sister clade K, is the most widespread and abundant covering the U8a and U8b ranges [24] and even reaching India [27]. A network [28] built with the 48 U8a sequences found (Fig. 2), could be rooted and resolved attending to the phylogeny of the U8 complete sequences. Its most ancestral node is represented by Bq1820 and the only Anatolian lineage assignable to haplogroup U8a, both carrying the CRS motif in HVSI, and transitions 73, 282 in HVSII. This ancient connection might trace the hypothetic route followed by the U8a ancestor from West Asia to the Basque country. The absence of U8a in North-Africa and its extremely rare presence in the eastern Mediterranean area further reinforces this continental route of entrance against a southern alternative. It is deduced from the network (Fig. 2) that a first U8a radiation in Europe affected Iberia, Central Europe and reached the Baltic. A second, U8a1, broader expansion further enlarged its range to Russia and Scotland where the U8a diversities are lower than in the central area (Table 1). As the total rooted network does not achieve the star genealogy, to calculate coalescence ages [29], we estimated the average distance and coalescence age to the U8a1 founder haplotype (16146 16342) and to the ancestral U8a haplotype (CRS) independently, representing, in the last case, the U8a1 radiation by only one basic lineage. Time estimations for the younger U8a1 expansion was 14 ± 5 Ky and 23 ± 14 Ky for the U8a subset that, if added to first, would give a total time for the U8a coalescence of around 37 ± 14 Ky. Notice that both HVSI estimations are higher than, but not significantly different, from those calculated using complete sequences. To compare the U8a diversity (p ± σ) among regions, we grouped the European populations in different Paleolithic areas [24, 30]. The greatest diversity was found in the Iberian Peninsula when Basques are included, followed by the North Central area (Table 1). These data agree with the primary and secondary origins of expansion proposed on phylogenetic grounds, weakening the possibility that Basques would have obtain their total U8a diversity through recent immigrations from other European areas and reinforcing the hypothesis that the first U8a radiation in Europe happened in an area in which the Basque country was included.

Figure 1
figure 1

Phylogenetic tree based on complete U8 sequences. Numbers along links refer to nucleotide positions. Open boxes are nodes from which other (not shown) sequences branch. A, C, indicate transversions; "d" deletions and "i" insertions. Star has the following mutations with respect to rCRS: 73, 263, 311i, 750, 1438, 2706, 4769, 7028, 8860, 11719, 14766 and 15326 and the following ones respect to L3*: 8701, 9540, 10398, 10873, 12705, 15301, 16223, 16519. Subject origins are: Dutch (Dut86; [38]), Spanish (Spa) and Italian (Ita) [20], 3 Finns (Fin; [14]). British-Australian (BriT7; Obayashi T, Tanaka M, personal communication), Jordan (Jor767) and 4 Basques (Bq 1820, 30, 47, 24)

Figure 2
figure 2

Reduced median network relating U8a HVSI/II sequences. Star is CRS for HVSI and 73, 282 for HVSII. Numbers along links refer to nucleotide positions minus 16000; Underlined subjects were complete sequenced. The broken line is the less probable link in accordance with completed sequences (Fig.1). Codes are: AUS = Austrian; BQ = Basque; BRI = British-australian; EST = Estonian; FIN = Finn; FRA = French; GER = German; KAR = Karelian; LIT = Lithuanian; NCE = North-central European; NEE = Northeast European; NOR = Norwegian; NSP = Northeast Spanish; POL = Polish; POR = Portuguese; RUS = Russian; SCT = Scotch; SIC = Sicilian; SLO = Slovenian; SPA = Spanish; SSP = South Spanish; SWZ = Swiss; TUK = Turk; VOL = Volga-Ural.

Table 1 Gene diversity (p) and frequency of U8a/1 in different European macro-regions.

Conclusion

In summary, the analysis of U8a lineages supports the idea that Basques have lived in their country since the Paleolithic, and that they could have participated in demographic re-expansions to repopulate central Europe in the last interglacial periods. Furthermore, these primitive U8a founders most probably reached the Basque area from the East through Europe and not through North Africa. However, the fact that we can trace some Basque lineages back to the Paleolithic does not support the generalized supposition that the present day Basque population is the best representative of Paleolithic Europeans. First of all, U8a haplotypes only represent 1% of the present day Basque maternal pool, therefore, a complex set of different mtDNA lineages with possible different histories are left unstudied. In addition, there is empiric evidence that Basques have received recent male gene flow from adjacent areas [31], and even possible maternal North African influences predating the Muslim Iberian invasion [32]. Furthermore, ancient DNA studies on Basque historic and prehistoric samples [33] have detected important mtDNA haplogroup frequency fluctuations along different periods. Definitively, like other European populations, Basques have also suffered migration and genetic drift effects throughout its long history.

Methods

Samples

DNA isolated from bucal swabs or blood samples from 211 autochthonous, unrelated Basques from the Iberian provinces were analyzed. Appropriate informed consent to anonymously use their data was obtained from all the individuals sampled.

HVSI-II and RFLPs

Total DNA was PCR amplified as in Pinto et al [34], and directly sequenced for both complementary strands as detailed in Rando et al [35]. A sequence of 978 bp of the HVSI-II of the mtDNA control region, from position 15997 to 00408 [36] was determined and sorted into defined haplogroups [24]. To confirm this HVS-based haplogroup classification, all individuals assigned to a specific haplogroup were additionally tested by restriction analysis of the diagnostic coding region mutations proposed to unambiguously classify sequences into haplogroups [24].

Complete mtDNA sequences

Four Basques (three U8a, one K1) and one Jordan (U8b) rare lineages belonging to the U/K haplogroup were fully sequenced. The complete mitochondrial DNAs (mtDNA) were amplified by PCR using primer pairs already described [19]. Amplified products were sequenced for both complementary strands with the Big Dye Terminator Cycle sequencing kit (Applied Biosystems). Sequencing reactions were analyzed on an Applied Biosystems 3100 DNA analyzer.

Genetic analyses

Genetic diversity (p ± σ) was estimated as the average number of nucleotide differences between two sequences [37], using the HVSI region in the range from 16070 to 16365 nucleotide positions.

Phylogenetic relationships among complete mtDNA sequences, and among control region mtDNA sequences, were established using the reduced median network algorithm [11]. In addition to our five sequences, seven lineages were added: Dutch (Dut86; GenBank: DQ112821) [[38] and P. Shen, pers. comm], Spanish (Spa, GenBank: AY882392) and Italian (Ita, GenBank: AY882393) [20], 3 Finns (Fin; GenBank: AY339551, AY339552 and AY339553) [14] and British-Australian (BriT7; [T. Obayashi, M. Tanaka, pers. comm.]).

The presence of U8a was tested in published sequences by the 16146 16342 HVSI motif and/or the 73 282 HVSII motif, and U8b by the 16189 16234 motif. 19,133 Eurasian and 1,430 North-African published and unpublished HVSI/HVSII sequences were analyzed.

The average sequence divergence [16] for complete sequences was converted in time applying a mutation rate of 1.26 × 10-8 [15].

Time estimations, based on control region mtDNA, were calculated as the mean divergence ρ [39] from inferred ancestral sequence types and converted into time by assuming that one transition within np 16090–16365 corresponds to 20,180 years [29].

Accesion numbers

The five complete mitochondrial DNA sequences are registered in [GenBank: DQ200801, DQ200802, DQ200803, DQ200804, and DQ200805].