Excavating Y-chromosome haplotype strata in Anatolia
- First Online:
- Cite this article as:
- Cinnioğlu, C., King, R., Kivisild, T. et al. Hum Genet (2004) 114: 127. doi:10.1007/s00439-003-1031-4
- 632 Views
Analysis of 89 biallelic polymorphisms in 523 Turkish Y chromosomes revealed 52 distinct haplotypes with considerable haplogroup substructure, as exemplified by their respective levels of accumulated diversity at ten short tandem repeat (STR) loci. The major components (haplogroups E3b, G, J, I, L, N, K2, and R1; 94.1%) are shared with European and neighboring Near Eastern populations and contrast with only a minor share of haplogroups related to Central Asian (C, Q and O; 3.4%), Indian (H, R2; 1.5%) and African (A, E3*, E3a; 1%) affinity. The expansion times for 20 haplogroup assemblages was estimated from associated STR diversity. This comprehensive characterization of Y-chromosome heritage addresses many multifaceted aspects of Anatolian prehistory, including: (1) the most frequent haplogroup, J, splits into two sub-clades, one of which (J2) shows decreasing variances with increasing latitude, compatible with a northward expansion; (2) haplogroups G1 and L show affinities with south Caucasus populations in their geographic distribution as well as STR motifs; (3) frequency of haplogroup I, which originated in Europe, declines with increasing longitude, indicating gene flow arriving from Europe; (4) conversely, haplogroup G2 radiates towards Europe; (5) haplogroup E3b3 displays a latitudinal correlation with decreasing frequency northward; (6) haplogroup R1b3 emanates from Turkey towards Southeast Europe and Caucasia and; (7) high resolution SNP analysis provides evidence of a detectable yet weak signal (<9%) of recent paternal gene flow from Central Asia. The variety of Turkish haplotypes is witness to Turkey being both an important source and recipient of gene flow.
The Anatolian Peninsula (Asia Minor) provides an important geographic link between the Middle East, Asia and Europe. Accordingly, this region manifests an elaborate genetic constitution reflecting the consequences of numerous gene flow, admixture and local differentiation processes spanning from the late Pleistocene to the present day (Cavalli-Sforza et al. 1994). Both environmental and cultural influences associated with the spread of the Upper Paleolithic industries (Kuhn 2002), the Last Glacial Maximum (LGM) and Holocene warming since the Younger Dryas cold reversal, as well as the introduction of agriculture and succeeding Bronze Age, Greek, and Roman presence, may have left detectable traces in the gene pool. In addition, resettlements from Central Asia (Richards et al. 2000), as well as movements during the Ottoman Empire, including recent exchanges of numerous Greek and Turk residents based upon religious affiliation during the 1920s, would add further potential complexity to the phylogeography patterns in Anatolia. The question that we ask in this paper: is it possible to attribute any elements of the amalgamated Anatolian genetic composition to any relatively ancient and recent chronologies/populations? While most human genetic diversity is affected by recombination, the low effective population size of clonal Y-chromosome segments (Shen et al. 2000) enhances them with greater sensitivity to detect incidents in the demographic histories of the populations that may otherwise leave little imprint on the autosomal elements of the gene pool. The resulting often non-random correlations between binary marker defined haplogroups with geography (Underhill et al. 2001) and corresponding short tandem repeat (STR) variance (de Knijff 2000) provide a genetic metric with which to sieve through complex deposits of human history on both micro-geographic and temporal scales. To begin to better understand how the succession and magnitude of events spanning millennia have contributed to the current genetic composition of Turkey, we have assessed patterns of Y-chromosome diversity distributed across Turkey plus Istanbul. The data illuminate numerous long-standing themes, including the Holocene expansions, contributions of agriculturalists to the European gene pool and genetic assessment of Caucasian and Central Asian gene flows.
Materials and methods
Polymorphisms and haplotyping
Description of Y-chromosome binary polymorphisms
Total size (bp)
G to A
G to A
C to T
C to T
T to G
G to C
G to A
A to T
A to C
T to A
T to G
G to C
C to T
C to A
G to T
T to C
A to G
A to G
A to C
G to C
C to G
STR variance, averaged over ten loci on binary haplotype backgrounds with sample sizes ≥7, was used to assess the relative level of diversity and phylogenetic substructure with geography. The F tests, based upon the ratio of χ2 distributions of average variances, were used to evaluate comparisons of average variances amongst geographic regions. STR data were also used to estimate haplogroup specific expansion times by two methods. Both approaches assume a stepwise mutation model, an average evolutionary STR mutation rate of 0.0007 per STR locus per generation (Zhivotovsky et al. 2003), whose value is based upon a generation time of 25 years. One method assumes a star-like genealogy characteristic of continuous population growth in which the variance is equal to the mutation rate per generation time the number of generations since expansion (Di Rienzo et al. 1994; Kittles et al. 1998). The other method employs a Bayesian algorithm. To estimate the time of Anatolian population expansions, we used the Markov chain Monte Carlo (MCMC) approach (Wilson et al. 1998) incorporated into the program BATWING to estimate posterior distributions for parameters of a given model of population history.
We considered a model of exponential growth from initial constant population size beginning at time Beta, with an effective population size prior distribution specified as a gamma (1, 0.0001) as used by Weale et al. (2001). The prior distribution for the STR mutation rate was specified as a gamma distribution with a mean of 7×10−4 per locus per generation and the prior distribution of the growth rate were assigned a gamma (1, 0.001). The prior distribution for Beta was assigned a broad uniform prior (0, 15). Priors were specifically chosen to be as uninformative as possible so as to minimally impact the results. We calculated the mean, median, and 2.5 and 97.5% quantiles for the posterior distributions for Beta, the estimated time of population expansion. Beta is expressed as a fraction of the initial population size multiplied here by generation time to yield values standard units of time. Calculations were based on 50,000 runs of MCMC estimator after a 20,000 run “burn in time.”
Y chromosome haplogroups and associated diversity
Y-chromosome haplogroup variance and expansion times based on ten STR loci
Beta and percent quantiles (kyr)b
Initial effective population
size per 1,000 individuals
J1-M267, long DYS388 alleles
J1-M267, short DYS388 allele
Correlations of Y-chromosome haplogroup frequencies with geography
Spearman’s correlation coefficient
Distance from region 1
Haplogroup J1-M267 occurs at 9% frequency and is also uniformly distributed across Turkey (χ2=9.02, df=8, P=0.34), with the exception of eight samples localized to the northern geographic periphery that all have an unusual “short” 13 repeat DYS388 allele, that was confirmed by sequencing. No DYS388 intermediate size 14 tandem repeat alleles were detected in any of our J1 samples. We propose that this subset of J1 lineages have a unique heritage since, besides the suggestive micro-geography, the occurrence of a short DYS388 allele on a J background is symptomatic of a deviation of the stepwise mutational process, as already proposed for this locus in a different allelic context (Nebel et al. 2001a).
The G-M201 haplogroup occurs at (57/523) 10.9% frequency and 0.40 mean STR variance consistent with its early presence in Anatolia. One major clade, G2-P15 and two less frequent sub-clades, G1-M285 and G3-M287, account for all the variation observed except for one individual from Kars region 4 who was left unresolved to the G-M201* level. The totality of G lineages do not show micro-geographic structure on the basis of the criteria used to describe the nine geographic regions (χ2=9.21, df=8, P=0.33), but they do significantly correlate (χ2=4.11, P<0.043) when evaluated on the basis of the archeological boundaries of the Bronze Age Hattic and Kaska cultures (Fig. 1). In addition, variances of G2-P15 lineages are correlated with longitude (r=−0.72, P<0.03, n=9) showing higher variances towards western Anatolia. The distinctive G1-M285 lineages are restricted to region 3.
Haplogroup I-M170 is a major lineage cluster largely restricted to populations of Europe (Semino et al. 2000a). Despite its relatively low average frequency (5.3%) in Turkey five major sub-clades were detected. The I-M170 chromosomes are more localized towards the west and show a significant correlation (r=−0.82) with longitude and geographic distance from region 1 (r=−0.85), the European pole of Turkey (Table 3). While haplogroup I-M170 displays overall with high STR variance (Table 2) the I1-P37 sub-clade accounts for almost one half of the lineages overall, but it shows significantly lower variance relative to other I lineages (Table 2).
Haplogroup E3b-M35 occurs at an overall 10. 7% frequency with E3b1-M78 and E3b3-M123 accounting for all E representatives except a single E3b2-M81 chromosome. Although E3b1-M78 and E3b3-M123 occur at similar frequencies (5.0% and 5.5%, respectively) their associated mean STR variances (Table 2) are significantly different [F (280,250) = 2.83, P<0.01]. The more diverse Turkish E3b3-M123 lineages are correlated with latitude (Table 3).
Variance of R1b3-M269 and TaqI p49a,f Ht15, Ht35 STR haplotypes
In Turks R1b3-M269 and R1a1-M17 occur at 14.7% and 6.9%, respectively. In addition R1b3-M269 related YSTR variance is significantly higher than that of R1a1-M17 [F (750,350) = 1.32, P<0.01). While no micro-geographic substructure is detected in Turkey for R1b3-M269, the frequency of R1a1-M17 is higher in Eastern Turkey and its distribution significantly correlates with longitude across the nine regions (Table 3). The majority of L-M11 chromosomes occur in the most eastern regions 3 and 4 (χ2=17.99, df=8, P<0.021) and also have high levels of variance (Table 2).
Under an assumption of a negligible role of natural selection on Y-chromosome haplogroup distribution, the assessment of background STR variance can provide insights into haplogroup subdivision, size fluctuation, directionality of distribution and relative chronology amongst haplogroups. The haplogroup-specific variances may reflect potential associations with Upper Paleolithic, Holocene and agriculturalist processes. Although the occurrence of early agriculture in the Near East is almost contemporaneous with the onset Holocene climatic warming, the consequences of growth and migration specifically due to agriculture are likely to be more recent.
Haplogroup J and the transition to agriculture
Although the entire J-M304 clade demonstrates a large microsatellite variance that under a continuous growth model dates to around 20 kyr, consistent with the LGM, the BATWING exponential growth model reveals a more recent post-LGM expansion (13.9 kyr). This secondary expansion originates from a low effective population size (n=184) and may indicate that the J clade in Turkey began to participate in demographic expansions during the onset of sedentism in Anatolia and the Levant; e.g., the Natufians (Bar-Yosef 1998). Previously, J clade representatives would have been accumulating STR diversity via genetic drift within various small groups of mobile hunter-gathers during the LGM. We detected a significant reduction of variance of J2-M172 northwards in Turkey This latitudinal trend could be a consequence of an Upper Paleolithic presence of J2-M172 in southern Anatolia and its subsequent spread north and west during the Holocene likely catalyzed by the transition to agriculture (Ammerman and Cavalli-Sforza 1984; Underhill 2002). The northward gradient in J2-M172 variance is consistent with the archeological evidence that agro-pastoral economies of Northwest Anatolia were derived from the Çatal Höyük area in region 7 (Thissen 1999). The presence of J2-M172 related lineages successfully predicted the distribution of both Neolithic figurines and painted pottery attributed to agriculturalists (King and Underhill 2002). The Upper Paleolithic sites in Turkey (Öküzini cave, region 6) have been dated to 17,800 BC and suggest a continuous occupation into the subsequent Neolithic period (Kuhn 2002) while Neolithic sites are considerably fewer in Central and Northern Turkey (Roberts 2002). The J1-M267 and J2-M172 distributions in the Near East and Europe can be inferred from previously reported DYS388 data associated with Eu10 and Eu9, respectively (Semino et al. 2000a; Nebel et al. 2001b; Malaspina et al. 2001; Al-Zahery et al. 2003). While both J1 and J2 are found in the Near East, haplogroup J1-M267 typifies East Africans and Arabian populations, with a decreasing frequency northwards. Alternatively the majority of J lineages in Europe are J2-M172 that radiated from the Levant, coherent with the distributions of mitochondrial J, K, T1 and pre-HV clades (Richards et al. 2002).
Although we currently lack additional binary polymorphisms capable of defining further informative subdivision within haplogroup J1-M267, the unusual short DYS388 13 repeat allele lineage provides a proxy. These peculiar chromosomes distribute along the northern tier of Turkey. While this lineage has not been observed in Greece, it has been detected in Georgia (Semino, unpublished results), suggesting Black Sea coastal gene flow. A few lineages with potentially similar affinity have been observed scattered throughout the Middle East (Nebel et al. 2001b), although it is not possible to distinguish their affinity to haplogroup J-M304* or J1 since M267 data are unavailable. When the DYS388 “short” allele representatives are excluded on the assumption that they have a common origin, the residual assemblage of J1-M267 DYS388 “long” allele lineages contain numerous haplotypes including both the purported “Cohen” and “Arab” modal haplotypes (Thomas et al. 2000; Nebel et al. 2002). The similarity of variances associated with the two counterbalancing J1 and J2 sub-clades suggests an enduring common demography. At this level of molecular resolution, the data do not distinguish between agricultural and pastoral domestic livelihoods despite the observation that lifestyle differences exist (Khazanov 1984). Notably, nomads are often more endogamous and participate in transhumant seasonal migrations (Cavalli-Sforza et al. 1994).
The J2f-M67 clade is localized to Northwest Turkey. It is well known that during this period, Northwest Anatolia developed a complex society that engaged in widespread Aegean trade referred to as “Maritime Troia culture,” involving both the western Anatolian mainland and several of the large islands in the eastern Aegean, Chios, Lemnos and Lesbos (Korfmann 1996). Another J2 component is intriguing. Although J2e-M12 lineages occur at low frequencies, they are widely distributed in the Middle East (Scozzari et al. 2001) and India (Kivisild et al. 2003), as well as in Saami from Kola, Russia (Raitio et al. 2001). By comparing data sets (Malaspina et al. 2001; Scozzari et al. 2001) we deduced that J2e-M12 lineages are distinctive from all other J2-M172 lineages on the basis of complex DYS413 and YCAII dinucleotide STRs. In corroboration we confirmed by sequencing the simple repeat locus DYSA7.2 that J2e-M12 is exclusively associated with shorter seven- or eight-tetranucleotide repeat alleles in Turkey. The considerable diversification observed in the J clade as exemplified by high variance of J2-M172 and a J-M304* lineage in southeastern Anatolia, is consistent with the early onset of post glacial sedentism found in the archeological record of Anatolia and the Levant (Bar-Yosef 1998).
G-M201 and post ice-age expansions in Europe
Although recurrent mutation can occur in the complex 49a,f RFLP polymorphic system the TaqI ht8 restriction profile occurs only within haplogroup J and G lineages (Semino et al. 2000a) suggesting common ancestry. The overlap of J and G lineages with geography bolsters this putative affinity. The apparent scarcity of Upper Paleolithic sites in Anatolia (Kuhn 2002) and the considerable diversification of haplogroup G and J ancestors is consistent with a Upper Paleolithic/Mesolithic Middle East/Mesopotamian origin and the subsequent gradual proliferation of agriculturalists, including their presence (e.g., Çatal Höyük, region 7) during the early Pre-Pottery Neolithic B period (~9,500 BP). Haplogroup G-M201 lineages occur at ~30% in Georgia (Semino et al. 2000a) and the north Caucasus (Nasidze et al. 2003). Haplogroup G-M201 also occurs in Southeast Europe and the Mediterranean (Semino et al. 2000a) and in Iraq (Al-Zahery et al. 2003). In a material context, the Bronze Age Hattic and Kaska cultural region in Anatolia (Fig. 1) has affinity to the Maikop culture of the Caucasus and linguistic affinities to the northwest Caucasian languages (Renfrew 1998). Populations that speak such languages show a high frequency of G-M201 (Nasidze et al. 2003). Haplogroup G2-P15 is the most frequent (9%) G sub-clade in Turkey. G2-P15 lineages have been observed throughout the Middle East with a maximum of 19% in the Druze (Hammer et al. 2000) and an average of 5% in Italy and Greece (Di Giacomo et al. 2003). The expansion time estimates for G2-P15 closely approximate those predicted for R1b3-M269.
Role of R1b3-M269 in the Aurignacian and Neolithic eras
Haplogroup R1b3-M269 is one of the most common binary lineages observed in Turkey. The phylogenetic and spatial distribution of its equivalent in Europe (Cruciani et al. 2002), the R1-M173 (xM17) lineage for which considerable data exist (Semino et al. 2000a; Wells et al. 2001; Kivisild et al. 2003) implies that R1b3-M269 was well established throughout Paleolithic Europe, probably arriving from West Asia contemporaneous with Aurignacian culture. Although the phylogeographic pattern of R1b3-M269 lineages in Europe suggest that R1-M173* ancestors first arrived from West Asia during the Upper Paleolithic, we cannot deduce if R1b3-M269 first entered Anatolia via the Bosporus isthmus or from an opposite eastward direction. However, archeological evidence supports the view of the arrival of Aurignacian culture to Anatolia from Europe during the Upper Paleolithic rather than from the Iranian plateau (Kuhn 2002).
Haplogroup R1b3-M269 occurs at 40–80% frequency in Europe and the associated STR variance suggests that the last ice age modulated R1b3-M269 distribution to refugia in Iberia and Asia Minor from where it subsequently radiated during the Late Upper Paleolithic and Holocene. The R1b3-M269 related, but opposite TaqI p49a, f ht 15 and ht35 distributions reflect the re-peopling of Europe from Iberia and Asia Minor during that period. The R1b3-M269 variances and expansion time estimates of Iberian and Turkish lineages are similar to each other (Table 2) but higher than observed elsewhere (Table 4). Low variances for R1b3-M269 lineages have also been reported for Czech and Estonian populations (Kivisild et al. 2003).
In contrast, the R1-M173 related but offsetting clade R1a1-M17, is frequent (30–60%) in East Europe, Central Asia, and Northwest India (Semino et al. 2000a; Wells et al. 2001; Passarino et al. 2001; Kivisild et al. 2003). This pronounced R1-M173 related Y-chromosome substructure contrasts to the observed uniform frequency spectrum of the major mitochondrial DNA haplogroups in Europe. The higher frequency of R1a1-M17 lineages in eastern Turkey is consistent with an entry into Anatolia via the Iranian plateau where the associated variance is appreciably higher (Quintana-Murci et al. 2001). The most common R1a1-M17 haplotype in Armenia (Weale et al. 2001) matches the most common in Turkey.
Haplogroup I-M170 indicates gene flows from Croatia
The phylogeography and high associated variance of I-M170 is consistent with an in situ European origin of M170 in the Balkans (Semino et al. 2000a), possibly near the Dinaric Mountain chain in Croatia where it has been observed at the highest frequency known so far (Barac et al. 2003). I-M170 lineages radiated both towards north central Europe and into western Turkey. Comparison of STR haplotypes indicates that the Dinaric modal haplotype is associated with the I-P37 lineages observed in Turkey. Molecular analyses of I-M170 group lineages at equivalent resolution in modern day Bulgaria, Croatia and Greece will be required to better understand the phylogeography of I-M170 sub-clades.
Haplogroup E3b and Neolithic expansions
While both E3b1-M78 and E3b3-M123 occur at similar frequency in Turkey, the variance of the former is considerably lower than the latter suggesting either temporal or effective population size differences. The prevalence of haplogroup E (xM2) chromosomes in northern Egypt may reflect the source of non-African E3b lineages (Manni et al. 2002). Haplogroup E3b1-M78 haplotypes typify European lineages (Semino, unpublished) and have expansion dates consistent with expansion of agriculturalists (Table 2). Haplogroup J2-M172 lineages likely reflect the introduction of agriculture to India from the Middle East (Kivisild et al. 2003). However, the absence of E3b lineages in India supports the inference that the higher variance and older expansion dates for E3b3-M123 in Turkey do not reflect an earlier dispersal, but rather multiple founders with more associated diversity.
The spread of haplogroup L-M11 lineages is largely restricted to populations of the south Caucasus (Weale et al. 2001), Middle East (Nebel et al. 2001b), Pakistan (Qamar et al. 2002) and India (Kivisild et al. 2003). Interestingly Turkish L lineages lack the M27 mutation that characterizes Indian and Pakistani L lineages. Although no M27 data exist for Armenians, the haplogroup L modal haplotype of the six STR loci in Armenians haplotype (Weale et al. 2001) matches the most common Turkish counterpart. An attempt to interpret other informative lineages in Turkey such as I1a-M253, J2a-M47, J2f-M67, K2-M70, N-M231 is premature until they are adequately surveyed elsewhere.
Minor genetic influence of Turkic speakers
Various estimates exist of the proportion of gene flow associated with the arrival of Central Asian Turkic speaking people to Anatolia. One study based on analyses of six STR loci in 88 Y-chromosomes from Turkey suggested only a 10% contribution (Rolf et al. 1999). Another study suggests roughly 30% based upon mtDNA control region sequences and one binary and six STR Y-chromosome loci analyzed in 118 Turkish samples (Di Benedetto et al. 2001). While it is likely that gene flow between Central Asia and Anatolia has occurred repeatedly throughout prehistory, uncertainties regarding source populations and the number of such episodes between Central Asia and Europe confound any assessment of the contribution of the 11th century AD Oghuz nomads responsible for the Turkic language replacement. These new Y-chromosome data provide candidate haplogroups to differentiate lineages specific to the postulated source populations, thus overcoming potential artifacts caused by indistinguishable overlapping gene flows. The best candidates for estimations are Asian-specific haplogroups C-RPS4Y (Wells et al. 2001; Karafet et al. 2001; Zerjal et al. 2003) and O3-M122 (Su et al. 2000). These lineages occur at 1.5% in Turkey (8/523). Using Central Asian Y-chromosome data from either 13 populations and 149 samples (Underhill et al. 2000) or 49 populations and 1,935 samples (Wells et al. 2001) where these diagnostic lineages occur at 33% and 18%, respectively, their estimated contributions range from 0.0153/0.329×100=4.6% to 0.0153/0.180×100=8.5%. During the Bronze Age the population of Anatolia expanded, reaching an estimated level of 12 million during the late Roman Period (Russell 1958). Such a large pre-existing Anatolian population would have reduced the impact by the subsequent arrival of Turkic speaking Seljuk and Osmanlı groups from Central Asia. Although the genetic legacy of Anatolia remains somewhat inchoate, our excavations of these new levels of shared Y-chromosome heritage and subsequent diversification provide new clues to Anatolian prehistory, as well as a substantial foundation for comparisons with other populations. Our results demonstrate Anatolia’s role as a buffer between culturally and genetically distinct populations, being both an important source and recipient of gene flow.
We are grateful to all the donors for providing DNA samples for this study. This study was supported by NIH grants GM28428 and GM 55273 to L.L.C-S and by Progetti Ricerca Interesse Nazionale 2002 and CNR “Beni Culturali” to O.S. We thank C. Edmonds for regression analyses associated with DYS389 calibration.