Introduction

Molecular phylogeography of Y-chromosome haplogroups provides intriguing clues into the history of human populations, especially in reconstructing of ancient migrations and more recent gene flow episodes (Underhill 2003). One of the most widespread Y-haplogroups in north Eurasia is haplogroup N defined by markers M231 and LLY22g (Jobling and Tyler-Smith 1995; Cinnioglu et al. 2004). This haplogroup is represented by subclades N*, N1, N2 and N3, which are thought to be historically informative in north Eurasian populations (Rootsi et al. 2007). It has been generally accepted that the hg N ancestor is of east Asian ancestry, because haplotypes belonging to ancient paragroup N* are still widely distributed in southeastern Asia and, rarely, even in south Siberia (Rootsi et al. 2007; Sengupta et al. 2006). However, the origin and migration ways of hgs N2 and N3 are not fully understood. According to the first studies, the most likely place of origin of these Y-chromosomes is the area of Mongolia and north China (Zerjal et al. 1997; Karafet et al. 2002). However, substantial frequency of hgs N2 and N3 in Eastern Europe, mainly in Finno-Ugric and Turkic-speaking populations, indicates that these haplogroups may have originated in Eastern Europe (Rosser et al. 2000; Underhill et al. 2001). Moreover, STR diversity within hg N3 has been detected on higher level in Eastern Europeans than in Siberians, which can be interpreted as supporting an origin of hg N3a in Eastern Europe (Villems et al. 2002). Recently, phylogeographic study of hg N structure in north Eurasian populations has demonstrated that south Siberia could be a place of transition of hg N (including N3a and N2) westward to Eastern Europe (Rootsi et al. 2007). However, there is no evidence still that the age of accumulated STR variation within hg N3a is higher in south Siberia.

In this study, we improve our understanding of hg N phylogeographic structure by examining STR variation in a large number of individuals of south Siberian and Eastern European origin. The results obtained give a clear evidence of that hg N3a is older in south Siberia than in Eastern Europe. In addition, the substructure of hg N2-A revealed in south Siberia suggests that this region was a place of several expansions of hg N haplotypes at different times during the Holocene.

Materials and methods

Subjects and DNA typing

A total of 1,438 samples (whole blood and hair root samples) from unrelated males were collected in populations of south Siberia (south Altaians, Shors, Tuvinians, Tofalars, Sojots, Khakassians, Buryats, Kalmyks, Evenks, Yakuts, Evens, Koryaks), central and southwest Asia (Mongolians, Koreans, Tajiks, Iranians) and Eastern Europe (Russians) (Table 1). All samples were collected with appropriate ethical approval and informed consent.

Table 1 Haplogroups N3 and N2 distribution (no. of individuals and % values in parenthesis) in populations studied

Hg N markers LLY22g (for the whole hg N) (Jobling and Tyler-Smith 1995), Tat (for N3) (Zerjal et al. 1997), M178 (for N3a) (Underhill et al. 2000; Kharkov et al. 2005), P43 (for N2) (Karafet et al. 2002) were assayed as described in the referred papers. The Y-SNP haplogroup nomenclature used here is according to the recommendations of the Y Chromosome Consortium (2002). A total of 234 samples belonging to haplogroups N2 and N3 were analyzed at 11 STR loci (DYS19, DYS385a/b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439) using PowerPlex® Y System (Promega Corporation, Madison, WI). Products of amplification were analyzed on ABI 3100 Genetic Analyzer (Applied Biosystems) with an appropriate filter set prepared on the basis of the relevant PowerPlex® Matrix Standards provided by Promega Corporation. Electrophoresis results were analyzed using Genscan v. 3.7 and Genotyper v. 3.7 software (Applied Biosystems). Alleles were designated by repeat numbers.

Data analysis

Median-joining (MJ) networks of hg N STR-haplotypes were constructed using the Network 4.1.1.2 program (http://www.fluxus-engineering.com). Networks were calculated by the median-joining method after processing the data with the reduced median network method in order to reduce reticulations within the network. For network construction, STR loci were weighted according to the average of their variability in the corresponding haplogroups. All of these approaches are described elsewhere (Bandelt et al. 2000).

The age of microsatellite variation within Y-chromosome haplogroups was estimated as the average squared difference in the number of repeats between all current chromosomes and the founder haplotype, averaged over microsatellite loci and divided by w = 6.9 × 10−4 per 25 years (Zhivotovsky et al. 2004). We followed the recommendation of Sengupta et al. (2006) and used the median haplotypes (formed by the median values of the repeat scores at each microsatellite locus within each haplogroup) as the founding ones.

Published hg N STR-data for Eastern Europeans (Komis, Tatars, Maris, Udmurts, Bashkirs, Karelians, Vepsas, Estonians, Ukrainians and Slovaks) and Asians (Altaians, Tuvinians, Khakassians, Evenks, Eskimo and Chinese) (Rootsi et al. 2007) were also included in the analyses.

Results and discussion

SNP analysis of 1,438 Y-chromosomes from 17 ethnic groups representing populations of north Asia and Eastern Europe demonstrates that hgs N3a and N2 are relatively frequent in some populations (Table 1). Haplogroup N3a, present at its highest frequency in Yakuts (80%), Tofalars (27%), Koryaks (25%) and Buryats (18%), is relatively frequent in all other Siberian populations as well as in the Russians (11%) studied. Haplogroup N2 chromosomes are present in the majority of the Siberian samples, reaching the highest frequencies in Tofalars (43%), Khakassians (34%) and Evenks (24%). However, this haplogroup is absent in northeastern Siberian populations (such as Evens, Koryaks and Yakuts). In addition, hg N2 is very rare in Russians (only 0.2%).

In order to obtain a better understanding of phylogenetic relationships between Y-chromosomes found in individuals carrying N2- and N3-lineages, we have determined their haplotypes at 11 microsatellite loci (Table 2). In general, 234 samples were genotyped. The age of accumulated STR variation within hg N3a estimated using the method of Zhivotovsky et al. (2004) indicates that this haplogroup is much more diverse in Eastern Europe than in Siberia (giving ages of 9.0 ky and 7.9 ky ago, respectively) (Table 3), which is in agreement with earlier observations made by Villems et al. (2002). However, median network analysis has shown that, in fact, there are two subclusters of N3a, both present in Siberia and Eastern Europe—N3a1 and N3a2 (Fig. 1). This subdivision of hg N3a has been suggested recently (Rootsi et al. 2007); however, phylogenetic and statistical analysis of hg N3a subclusters had not been performed.

Table 2 Frequency of N2 and N3a haplotypes in the populations studied. aPopulations coded as follows: Ru Russians, Yk Yakuts, Tv Tuvinians, Kk Koryaks, Ek Evenks, St Sojots, Mg Mongolians, Br Buryats, Al Altaians, Tf Tofalars, Kh Khakassians, Sh Shors, Ev Evens, Km Kalmyks. Sample sizes are given in parentheses. Presumed founder haplotypes for subclusters are shown in bold
Table 3 Coalescent times of haplogroups N3 and N2 in Siberia and Eastern Europe
Fig. 1
figure 1

Median-joining network of haplogroup N3a based on 11 STR loci (DYS19, DYS385a/b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438 and DYS439). The network includes 172 N3a-chromosomes of which 151 are novel (according to data presented in Table 2) and 21 were reported elsewhere (Rootsi et al. 2007). Each circle represents a haplotype, defined by a combination of STR markers. Circle size is shown proportional to haplotype frequency. Haplotypes are labeled as follows: Ru Russian, Yk Yakut, Tv Tuvinian, Kk Koryak, Ek Evenk, St Sojot, Mg Mongolian, Br Buryat, Al Altaian, Tf Tofalar, Kh Khakassian, Ev Even, Es Estonian, Sl Slovak, Ko Komi, Ch Chuvash, Ma Mari, Ud Udmurt, Ba Bashkir, Uk Ukrainian, Vp Vepsa, Ka Karelian. Colors indicate the subdivisions inside haplogroups: white for N3a1 haplotypes, black for N3a2 haplotypes

The presumed founder haplotypes for both subclusters (they are shown in bold in Table 2) are one-step neighbors to each other because there is only one difference in the numbers of repeats at the DYS391 locus between the two haplotypes. Nevertheless, this subdivision appears to be phylogenetically stable, because only four haplotypes (two in Russians, one in Yakuts and one in Khakassians) out of 40 N3a1-haplotypes demonstrated back-mutations from DYS392-11 to DYS392-10 allele (Table 2). Meanwhile, no back mutations from DYS392-10 to DYS392-11 allele were revealed in subcluster N3a2. Our analysis has shown that subcluster N3a1 is much older than N3a2, giving the age of 9.1 ky and 5.0 ky, respectively. Importantly, the subcluster N3a1 appears to be older in south Siberia than in Europe (10 ky and 8.2 ky ago, respectively), allowing the suggestion that the first expansion of N3a1 occurred in the south Siberian region, and then this subcluster spread to the West, into the Urals and Eastern Europe. The ancestral haplotype for N3a1 is present in Eastern Europeans (Russians and Estonians) and in south Siberians from the east Sayan regions (such as Tuvinians, Tofalars and Sojots), giving also rise to regional-specific branches of N3a1 in Yakuts and Khakassians/Tuvinians (Fig. 1). In general, this subcluster encompasses 58% of N3a individuals studied here (Table 1). It is noteworthy that the overwhelming majority of Russians (83%) as well as other Slavs and Baltic-Finnic individuals taken from the study of Rootsi et al. (2007) belong to subcluster N3a1.

Subcluster N3a2 is characterized by different tree topology. It has an almost star-like branching pattern in south Siberia, where it is present mostly in the Baikal region among Buryats. But it shows a very complex picture of the phylogenetic tree branching into its Eastern European part represented predominantly by Finno-Ugric and Turkic-speaking populations of the Volga-Ural region. High STR variation of subcluster N3a2 observed in the Volga-Ural populations, in comparison with south Siberians (6.6 ky vs. 3.7 ky ago), seems to be a consequence of serial bottlenecks and/or the small sample sizes of Eastern European N3a2 chromosomes studied. In addition, the Eastern European part of the N3a2 network displayed clear non-starlike features, which greatly reduce the precision of genetic age estimates (Sailard et al. 2000). To elucidate this question, much more detailed information on STR variation should be obtained from Finno-Ugric and Turkic-speaking populations of the Volga-Ural region. Meanwhile, traces of recent expansions of subcluster N3a2 are obvious in different populations of south Siberia—mainly, in Buryats and south Altaians.

It has been recently demonstrated that N2 haplotypes form two separate subgroups, N2-A and N2-E, characterized by an Asian and European pattern of distribution, respectively (Rootsi et al. 2007). Analyzing STR variation of N2 haplotypes in a large sample of Siberians, we have found that all of them belong to cluster N2-A. According to our data, the age of cluster N2-A is 4.8 ± 1.6 ky, which is somewhat less than estimates made by Rootsi et al. (2007). Median network analysis of N2-A haplotypes showed an existence of two subclusters determined by different STR composition in their founder haplotypes (with differences at loci DYS19, DYS391 and DYS439). Subcluster N2-A1, with founder haplotype present in different populations (such as Khakassians, Shors, Mongols, Altaians, Evenks and Tofalars), appears to be an ancestral subcluster, later giving rise to N2-A2 and N2-E. This follows from its central position in the median joining tree constructed with taking into account the location of N* lineages (Fig. 2). The age of STR variation within hg N2-A1 is a little higher than that of N2-A2 (2.4 and 1.8 kys, respectively). Interestingly, N2-A2 expansion is restricted mainly to the Tuva region.

Fig. 2
figure 2

Median-joining network of haplogroup N2 based on 11 STR loci (DYS19, DYS385a/b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438 and DYS439). The network is rooted in haplogroup N*, represented by Chinese (Cn) haplotypes (Rootsi et al. 2007). The network includes 97 N2 chromosomes, of which 84 are novel (according to data presented in Table 2) and 13 were reported elsewhere (Rootsi et al. 2007). Each circle represents a haplotype, defined by a combination of STR markers. Circle size is shown proportional to haplotype frequency. Haplotypes are labeled as follows: Ru Russian, Tv Tuvinian, Ek Evenk, Mg Mongolian, Br Buryat, Al Altaian, Tf Tofalar, Kh Khakassian, Sh Shor, Km Kalmyk, Em Eskimo, Ta Tatar, Ko Komi, Ma Mari, and Vp Vepsa

Overall, the data obtained indicate that south Siberia was a place of several expansions of hgs N2 and N3 haplotypes at different times during the Holocene. Future examinations of Y-chromosome variation using a combined STR-SNP approach in additional population samples may enable a better definition of the differences between north Eurasian population groups.