Introduction

The area between the Dniester and the eastern Carpathian mountain range, which we refer to as the Dniester–Carpathian region, represents an important geographical link between eastern Europe and the Balkans. Beginning from the Neolithic epoch, advanced farming traditions and technologies spread from the Balkan Peninsula into eastern Europe through the Dniester–Carpathian territory. Nomadic tribes of the Kurgan cultures moved in the opposite direction, from the Eurasian heartlands towards the west and southwest. The constant interaction of the resident agricultural populations with nomadic tribes was an important factor for the demographic development in this area (for review see Dergachev 1999; Velicanova 1975). At the end of the second millennium BC the Dniester–Carpathian region was inhabited by different Thracian tribes. Later, during the Roman domination in southeastern Europe, they were subjected to Romanization. In the fifth to seventh centuries AD different Slavic clans peopled southeastern Europe and had close contact with the Romanized populations. As a result of these interactions, the old Romanic community north of the Danube appeared. The complicated history of the Dniester–Carpathian region caused a considerable fragmentation of its cultural and linguistic landscape. Nowadays, people in the Dniester–Carpathian region still have different cultural and linguistic habits. Together with the prevalent Romanian-speaking population (Romanians and Moldavians), Slavic peoples (Russians, Ukrainians, and Bulgarians), Gagauzes, Roma (Gypsies) and other nationalities live here.

Among this ethnic variety, the Turkic-speaking Gagauzes deserve special attention since they are linguistically isolated among the Indo-European majority. The Gagauzes speak the Oghuz version of the Turkic languages, to which the Turkish, Azerbaijanian and Turkmenian languages also belong. However, two important differences exist between the Gagauzes and the above Turkic peoples: (1) a North-Turkic (Tartar or Kypchak) element is present in the Gagauz language besides the main South-Turkic (Oghuz) element, and (2) the Gagauzes are orthodox Christians. Several hypotheses about the ethnogenesis of the Gagauzes have been proposed (for review see Guboglo 1967). The Gagauzes are considered by some researchers as descendants of the nomadic tribes of the Uzi, Pechenegs and Polovtsi that are known to have crossed the Danube River and settled in the tenth to thirteenth centuries in the Balkan territories. Other authors consider them as descendants of the Seljuk Turks, who came from the Anatolian Peninsula and established a short-lived state in northeastern Bulgaria in the second half of the thirteenth century AD. According to the third scenario, the Gagauzes are Bulgarians turkicized during the Ottoman occupation of the Balkans in the fifteenth to nineteenth centuries.

In spite of the importance of this region in the population history of Europe, genetic studies of the Dniester–Carpathian population are scarce, and generally restricted to analyses of “classical” markers (Varsahr et al. 2001, 2003, 2006). DNA polymorphisms provide a rich source of information on the genetic structure and evolutionary history of human populations. In an attempt to gain an insight into the genetics of the Dniester–Carpathian region, we studied 12 polymorphic Alu markers in six autochthonous Dniester–Carpathian populations. The specific goals were (1) to determine the distribution of Alu insertions in the autochthonous Dniester–Carpathian populations, (2) to quantify the degree of genetic differentiation within the Dniester–Carpathian region, and (3) to assess the genetic relationships among the Dniester–Carpathian populations and their relations to linguistically and culturally closely related populations from Europe and Asia.

Materials and methods

A total of 513 autochthonous unrelated individuals from the Dniester–Carpathian region were analyzed. The six samples comprised Moldavians from village Karahasani (= 123), Moldavians from village Sofia (= 82), Gagauzes from village Kongaz (= 72) and from village Etulia (= 64), Ukrainians from village Rashkovo (= 85) and a sample of Romanians from the Piatra-Neamt and Bacau districts in Romanian Moldova (= 87) (Fig. 1). A specimen of blood (5 ml from each individual by venipuncture in EDTA) was taken after obtaining both the permission of the examined person and a description of his/her ancestral lineage. DNA was extracted from peripheral blood lymphocytes by a salt-based extraction method (Miller et al. 1988) or by using Amersham blood reagents and protocols (Amersham, Little Chalfont, UK). Twelve human-specific autosomal Alu polymorphisms were typed in each sample using the primers and PCR conditions described earlier (Batzer et al. 1996; Arcot et al. 1995a, b; Majumder et al. 1999; Comas et al. 2000). The PCR products were analyzed in agarose gels. DNA samples were visualized with ethidium bromide under UV light and the results were recorded.

Fig. 1
figure 1

Locations of the sampled populations in the Dniester–Carpathian region

The allele frequencies and heterozygosities were calculated as described by Zhivotovsky (1991). Exact tests for Hardy–Weinberg equilibrium (Guo and Thompson 1992), calculations of gene diversities (Nei 1987), analyses of molecular variance (AMOVA) and genetic distances (F ST) (Excoffier et al. 1992) were performed using the software package Arlequin version 2.0 (Schneider et al. 2000) with 10,000 permutations. The values of the genetic differentiation index G ST were calculated according to Nei (1973). Nei’s genetic distances were computed between pairs of populations and were represented in a neighbor-joining tree by means of the PHYLIP 3.5 package (Felsenstein 1993). A total of 1,000 bootstrap replications were performed to assess the strength of the branching structure of the tree. The Kruskal–Wallis’ test, Spearman’s correlation analysis and principal component (PC) analysis based on the correlation matrix of the Alu insertion frequencies were performed using the STATISTICA package (StatSoft 1995).

Results

Allele frequencies and genetic diversity within populations

The number of chromosomes examined and the allele frequencies of the 12 Alu markers are given in Table 1. All loci were polymorphic in all populations; no case of allele fixation was found. The frequencies of insertion polymorphisms were similar for all studied groups, generally falling within the range of European populations (for comparisons see Stoneking et al. 1997; Comas et al. 2000, 2004; Romualdi et al. 2002). However, the insertion rate at the TPA25 locus in the Moldavian sample from the Sofia settlement (0.659) lies slightly outside the European range and is close to the maximum found thus far [in Madras, India (0.690) and Sri Lanka (0.724); Antunez-de-Mayolo et al. 2002]. Furthermore, we note the relatively low insertion frequencies at the loci TPA25, B65, D1 and A25 in the Gagauzes from Etulia, and at the HS3.23 locus in the Gagauzes from Kongaz.

Table 1 Alu insertion frequencies in Dniester–Carpathian populations. The frequency indicated for each bi-allelic marker refers to the presence of the insert, except for CD4del. n Number of chromosomes. Moldavians: K Karahasani, S Sofia. Gagauzes: K Kongaz, E Etulia

The observed and expected genotype frequencies were in agreement in most populations. Only 3 of 72 tests for Hardy–Weinberg equilibrium showed significant departures from equilibrium (D1 in Ukrainians, HS3.23 in Romanians and HS2.43 in the Gagauz sample from Kongaz). Since none of the deviations are assigned to a particular locus or population, they probably represent random statistical fluctuations.

Six of twelve loci (ACE, TPA25, FXIIIB, B65, D1, CD4del) exhibited high heterozygosity levels (nearly 0.5) (Table 2). For four loci, APOA1, A25, HS2.43, and HS4.65, the diversity level was low (0.06–0.18). Average heterozygosity per population does not differ significantly between the samples when analyzed by the Kruskal–Wallis test (= 0.9957). This was expected since similar Alu insertion frequencies were found in all samples analyzed.

Table 2 Heterozygosities and genetic differentiation indices for individual and for all loci (considered jointly). Moldavians: K Karahasani, S Sofia; Gagauzes: K Kongaz, E Etulia

Genetic relationships between populations

To analyze the genetic relationships among the populations, two approaches were followed: tree reconstruction and principle component (PC) analysis. We based these calculations on the published results for 11 Alu loci in southeastern European populations (Stoneking et al. 1997; Romualdi et al. 2002; Comas et al. 2004) and our own data. Figure 2 shows the most probable (consensus) tree from the 1,000 trees generated by resampling. In the consensus tree, the compared populations do not constitute strongly pronounced groups. The low bootstrap support for the tree topology suggests the absence of considerable genetic barriers within southeastern Europe. However, the bootstrap is known to underestimate the true level of statistical support (Sitnikova et al. 1995). From simple visual inspection of the tree, a major distinction between North and South appears. The results of PC analysis confirmed the pattern observed in the consensus tree (Fig. 3). Thus, the first PC, which explains 24% of the variation in allele frequencies, tends to separate the southern and western Mediterranean (Turkish Cypriots, Greek Cypriots, Turks, northeastern Greeks, Albanians, Albanian Aromuns) from the northern and Balkan–Carpathian populations (Macedonians, Macedonian Aromuns, Romanians, Moldavians, Ukrainians, Gagauzes). High negative correlations of the first PC with geographical latitude (Spearman’s = −0.6887; = 0.00233) and distance from the Ukrainian settlement (Spearman’s = −0.6078; = 0.00964) provide quantitative evidence for the observations revealed in the tree and PC plot. Along the second principle axis, which explains 20% of the total genetic variance, the Gagauzes from Etulia and the Romanian Aromuns stand apart from the rest of the populations at the positive pole and the northeastern Greeks at the negative pole. No correspondence between the linguistic affiliation of the compared populations and genetic differentiation is observed in the plot.

Fig. 2
figure 2

Consensus tree depicting the relationships among the southeastern European populations analyzed for 11 Alu polymorphisms. Numbers on the branches are bootstrap values based on 1,000 replications. GAGE Gagauzes from Etulia, GAGK Gagauzes from Kongaz, MOLK Moldavians from Karahasani, MOLS Moldavians from Sofia, ROME Romanians from Piatra-Neamti and Buhusi, UKR Ukrainians, AALB Albanian Aromuns, ALB Albanians, AMK Macedonian Aromuns from Krusevo, AMS Macedonian Aromuns from Stip, AROM Romanian Aromuns, GRET Greeks from Thrace, MAC Macedonians, ROMP Romanians from Ploiesti, TURA Turks from Anatolia (Comas et al. 2004), TURC Turkish Cypriots, GREC Greek Cypriots (Stoneking et al. 1997; Romualdi et al. 2002). The populations examined in the present study are underlined and in italics

Since we are particularly interested in the origin of the Gagauzes, we assessed the genetic relationship of the Gagauzes and Turkic populations from Central Asia. To determine the genetic relationship of the southeastern European populations and Central Asian populations, we used the information on eight polymorphic Alu loci (ACE, TPA25, PV92, APOA1, FXIIIB, A25, B65, D1) previously published for the Uyghurs (Xiao et al. 2002), Uzbeks, Kazakhs, and northern and southern Kyrgyzes (Khitrinskaya et al. 2003). These 8 loci are a subset of the 12 loci we typed (data from the others were not available in the literature). The topology of the consensus tree (Fig. 4) generally reflects the racial classification of populations (Alexeev 1974). The Kyrgyzes and the Kazakhs, which are assigned to the Mongoloid race, cluster together in the tree with considerable distance to the European populations. The bootstrap values observed within the European population cluster are very small and neither geographic nor linguistic relationships are observed between the European samples in the tree, suggesting that information based on the eight Alu polymorphic loci was insufficient to resolve the relationship between these geographically close populations. The Uzbeks and the Uyghurs, who are considered as a mixed Mongoloid–Caucasoid population, occupy an intermediate position in the tree. The nodes separating Uzbeks and Uyghurs from the Mongoloid and Caucasoid clusters show relatively strong bootstrap support after 1,000 iterations. Both Gagauz samples are grouped together with the European samples.

Fig. 3
figure 3

Genetic affinities among 17 southeastern European populations based on the first two principle components (PCs) of allele frequencies at 11 Alu loci. Population codes as in Fig. 2. The populations examined in the present study are underlined and in italics. Symbols on the PC plot represent linguistic classification of the samples: Plus signs Italic, filled circles Slavic, filled triangles Turkic, inverted filled triangles Albanian, asterisks Greek

Genetic differentiation between populations

In order to assess interpopulation variability within the Dniester–Carpathian region, an AMOVA was performed (Table 2). The contribution of individual loci to the interpopulation variability of the region was low. For two loci (TPA25 and HS3.23), the values of the differentiation index (F ST) were significantly different from zero. The F ST for all loci means that only 0.38% of the total variance in allele frequencies at these loci is due to differences between the populations, while the rest is due to differences within the populations. Although this value implies a very low level of population subdivision, our F ST analysis suggests significant population differentiation within the Dniester–Carpathian region.

We have also analyzed genetic differentiation within Southeast Europe using the same set of populations and loci as in the phylogenetic analyses. Since only allele frequency data was available from the literature, we had to resort to Nei’s (1973) G ST approach. Within Southeast Europe, the fraction of the genetic variance attributable to differences among populations (G ST) was 1.61%, indicating that the level of genetic differentiation within Southeast Europe was two times lower than the level within the whole of Europe (see Kutuev et al. 2006).

When Dniester–Carpathian populations were divided into three groups defined by language (Romanian, Gagauz and Ukrainian), no significant difference was observed between population groups (Table 3). When we extended the analysis of genetic differentiation to Southeast Europe, the component of genetic variance due to differences among linguistic groups (defined as in Fig. 3) was very low (0.50%)—even lower than the component due to differences among populations within groups (1.11%). These findings suggest that language does not explain the genetic affinities among the Dniester–Carpathian and southeastern European populations.

Table 3 Components of genetic variance (%) at three levels of population subdivision; populations were pooled according to their affiliation to linguistic groups
Fig. 4
figure 4

Consensus tree of southeastern European and central Asian populations analyzed for eight Alu polymorphisms. Numbers on the branches are bootstrap values based on 1,000 replications. Codes for the southeastern European populations are as in Figs. 2 and 3; those for the other populations are as follows: KAZ Kazakhs, KYRN northern Kyrgyzes, KYRS southern Kyrgyzes, UZB Uzbeks (Khitrinskaya et al. 2003), UYGH Uyghurs (Xiao et al. 2002). The populations examined in the present study are underlined and in italics

Discussion

Pattern of Alu insertion variation in southeastern Europe

Previous analyses based on classical polymorphic markers (Cavalli-Sforza et al. 1994), autosomal DNA polymorphisms (Nasidze et al. 2001; Jorde and Wooding 2004; Tishkoff and Kidd 2004) and mitochondrial DNA (mtDNA) (Calafell et al. 1996) have revealed that Europe is a genetically homogeneous continent. This conclusion is supported by two lines of evidence. First, by small differentiation indexes: the F ST value for Europe is 2–7 times lower than in other Continents and geographical areas. Second, by small genetic distances: in a neighbor-joining tree of the world populations, European populations cluster in a small compact group, while other populations are connected to each other by much longer branches. The Dniester–Carpathian autosomal pool also follows this pattern. Our analysis of 12 autosomal DNA polymorphisms in the Dniester–Carpathian region has shown that the allele frequencies in these populations are similar to each other and to the frequencies observed in other European populations, despite considerable linguistic differences. The genetic homogeneity among southeastern European populations suggests either a recent common ancestry of all southeastern European populations or strong gene flow between populations, which eliminated any initial differences. Taking into account that the region has had a relatively high population density since the Neolithic period and that this region represents a crossroads of routes connecting the cultural centers of Middle East with different European areas, both explanations are plausible.

Despite the apparent low level of genetic differentiation of the southeastern European gene pool and the lack of correlation between linguistic and genetic geographic patterns, we demonstrated some interesting aspects of population structure. The first PC, which explains 24% of the total genetic diversity, is correlated with geographical latitude. The observed pattern of genetic differentiation within southeastern Europe is not surprising. Our results are consistent with those from classical and DNA markers (Cavalli-Sforza et al. 1994; Malaspina et al. 2001) and are also compatible with archaeological and paleoanthropological data. Since the Neolithic (7,500 BC) the eastern Mediterranean area has been a field of constant presence of agricultural communities. These arose from the common Neolithic ‘package’ originating in the Near East (Renfrew 1987). The demographical process in the northern part of Southeast Europe was different from that in the southern part. The Balkan–Mediterranean farming traditions developed here during the Neolithic—Early Eneolithic period (6,500–4,000 BC). Beginning from the Late Eneolithic the nomadic tribes of Kurgan cultures penetrated into the Carpathian basin and the Balkans from the Pontic steppes. These cultures developed on an East European Mesolithic basis (Dergachev 1999). The considerable differences in a set of morphological characters between farming tribes from Southeast Europe and the Mesolithic and nomadic tribes from East Europe (Velicanova 1975) suggest different structures of their respective gene pools. The genetic differences between northern and southern populations of Southeast Europe observed in our work seem to be due to the unequal proportion of the European (‘Mesolithic’) and Near-Eastern (‘Neolithic’) components in their gene pools. The second PC, which does not appear to exhibit any geographical pattern, has no obvious interpretation.

Alu insertion polymorphisms and the origins of the Gagauzes

Are the Gagauzes descendents of the Turkic nomadic tribes from the South Russian steppe (Uzi, or Pechenegs, or Polovtsi, etc.) or descendents of the Anatolian Turks (Seljuks and/or Ottomans)? In the first case, the Gagauzes should be genetically more similar to some Turkic populations (from the Eurasian Heartlands), while in the second case they should resemble populations from Anatolia. Our previous analysis of classical polymorphisms in the Dniester–Carpathian region demonstrated that Gagauzes group genetically with their geographic neighbors, rather than with any Turkic populations (Varsahr et al. 2001, 2003). The present analysis, based on autosomal DNA markers, is consistent with our previous results. The Gagauz samples showed considerable genetic distances to Central Asian populations. Furthermore, the genetic position of the Gagauzes in the tree was not intermediate between southeastern European and Central Asian populations. Therefore, our data reject the hypothesis that the Gagauzes are direct biological descendents of the Turkic nomads from the South Russian steppes.

According to the other scenario, the Gagauzes are descendants of the Seljuk Turks who migrated to the Balkans from Anatolia in the second half of the thirteenth century. Although the Gagauzes show closer relationships with the Dniester–Carpathian populations than with the Turks from Anatolia and Cyprus, it should be noted that the differences between the populations mentioned above are not sufficiently large to rule out the hypothesis of a Seljuk origin of the Gagauzes. A problem with this scenario, however, is that it does not explain the presence of the Kypchak (Tartar) element in the Gagauz language, which could only have entered by the northern route from the Eurasian steppes.

The lack of correlation between the linguistic and genetic differentiation in Southeast Europe (in particular in the Dniester–Carpathian region) suggests that ethnic and genetic differentiation occurred here relatively independently of each other. The genetic landscape of Southeast Europe had presumably been formed long before the linguistic/ethnic landscape we now observe was shaped. One other possibility is that the cultural barriers were not strong enough to prevent gene flow between populations. The Turkic language of the Gagauzes could be a case of language replacement. Replacement could occur via the ‘elite dominance’ model, which means that the original Turkic migrant groups could have been very small such that their genetic effect on the resident groups was negligible (Renfrew 1987). However, the elite dominance scenario is more suitable for larger populations, such as those of Anatolia or Azerbaijan, which consist of 70 million and 8 million people, respectively (Nasidze et al. 2001; Cinnioğlu et al. 2004). The Gagauzes are much less numerous (200,000). Thus, it is still possible that they are a remnant of a once larger Turk-speaking Orthodox group in southeastern Europe.

Conclusion

Our study of Alu polymorphisms indicates low levels of population differentiation in the Dniester–Carpathian region and in Southeast Europe. Although the interpopulation differentiation within Southeast Europe is small, tree reconstruction and PC analysis allowed a distinction between southern and northern populations. These observations are in agreement with classical markers and are also compatible with archaeological and paleoanthropological data. The genetic affinities among Dniester–Carpathian and southeastern European populations do not reflect linguistic relationships. Overall, these results indicate that ethnic and genetic differentiation occurred in these regions to a considerable extent independently of each other. Thus, based on our dataset of 12 Alu markers, we have accomplished the goals we set out to achieve in this project. Nonetheless, it would be desirable to collect more data to verify the conclusions obtained in this study. In particular, other marker systems with a pronounced ethnic specificity (e.g. Y-chromosome and mtDNA markers) may be useful, and more populations may need to be sampled.