Human Genetics

, Volume 114, Issue 2, pp 127–148

Excavating Y-chromosome haplotype strata in Anatolia

Authors

  • Cengiz Cinnioğlu
    • Department of GeneticsStanford University School of Medicine
    • Institute of Forensic Sciences Istanbul University
  • Roy King
    • Department of Psychiatry and Behavioral SciencesStanford University
  • Toomas Kivisild
    • Estonian Biocentre and Tartu University
  • Ersi Kalfoğlu
    • Institute of Forensic Sciences Istanbul University
  • Sevil Atasoy
    • Institute of Forensic Sciences Istanbul University
  • Gianpiero L. Cavalleri
    • Department of GeneticsStanford University School of Medicine
  • Anita S. Lillie
    • Department of GeneticsStanford University School of Medicine
  • Charles C. Roseman
    • Anthropological SciencesStanford University
  • Alice A. Lin
    • Department of GeneticsStanford University School of Medicine
  • Kristina Prince
    • Department of GeneticsStanford University School of Medicine
  • Peter J. Oefner
    • Stanford Genome Technology Center
  • Peidong Shen
    • Stanford Genome Technology Center
  • Ornella Semino
    • Dipartimento di Genetica e MicrobiologiaUniversità di Pavia
  • L. Luca Cavalli-Sforza
    • Department of GeneticsStanford University School of Medicine
    • Department of GeneticsStanford University School of Medicine
Original Investigation

DOI: 10.1007/s00439-003-1031-4

Cite this article as:
Cinnioğlu, C., King, R., Kivisild, T. et al. Hum Genet (2004) 114: 127. doi:10.1007/s00439-003-1031-4

Abstract

Analysis of 89 biallelic polymorphisms in 523 Turkish Y chromosomes revealed 52 distinct haplotypes with considerable haplogroup substructure, as exemplified by their respective levels of accumulated diversity at ten short tandem repeat (STR) loci. The major components (haplogroups E3b, G, J, I, L, N, K2, and R1; 94.1%) are shared with European and neighboring Near Eastern populations and contrast with only a minor share of haplogroups related to Central Asian (C, Q and O; 3.4%), Indian (H, R2; 1.5%) and African (A, E3*, E3a; 1%) affinity. The expansion times for 20 haplogroup assemblages was estimated from associated STR diversity. This comprehensive characterization of Y-chromosome heritage addresses many multifaceted aspects of Anatolian prehistory, including: (1) the most frequent haplogroup, J, splits into two sub-clades, one of which (J2) shows decreasing variances with increasing latitude, compatible with a northward expansion; (2) haplogroups G1 and L show affinities with south Caucasus populations in their geographic distribution as well as STR motifs; (3) frequency of haplogroup I, which originated in Europe, declines with increasing longitude, indicating gene flow arriving from Europe; (4) conversely, haplogroup G2 radiates towards Europe; (5) haplogroup E3b3 displays a latitudinal correlation with decreasing frequency northward; (6) haplogroup R1b3 emanates from Turkey towards Southeast Europe and Caucasia and; (7) high resolution SNP analysis provides evidence of a detectable yet weak signal (<9%) of recent paternal gene flow from Central Asia. The variety of Turkish haplotypes is witness to Turkey being both an important source and recipient of gene flow.

Introduction

The Anatolian Peninsula (Asia Minor) provides an important geographic link between the Middle East, Asia and Europe. Accordingly, this region manifests an elaborate genetic constitution reflecting the consequences of numerous gene flow, admixture and local differentiation processes spanning from the late Pleistocene to the present day (Cavalli-Sforza et al. 1994). Both environmental and cultural influences associated with the spread of the Upper Paleolithic industries (Kuhn 2002), the Last Glacial Maximum (LGM) and Holocene warming since the Younger Dryas cold reversal, as well as the introduction of agriculture and succeeding Bronze Age, Greek, and Roman presence, may have left detectable traces in the gene pool. In addition, resettlements from Central Asia (Richards et al. 2000), as well as movements during the Ottoman Empire, including recent exchanges of numerous Greek and Turk residents based upon religious affiliation during the 1920s, would add further potential complexity to the phylogeography patterns in Anatolia. The question that we ask in this paper: is it possible to attribute any elements of the amalgamated Anatolian genetic composition to any relatively ancient and recent chronologies/populations? While most human genetic diversity is affected by recombination, the low effective population size of clonal Y-chromosome segments (Shen et al. 2000) enhances them with greater sensitivity to detect incidents in the demographic histories of the populations that may otherwise leave little imprint on the autosomal elements of the gene pool. The resulting often non-random correlations between binary marker defined haplogroups with geography (Underhill et al. 2001) and corresponding short tandem repeat (STR) variance (de Knijff 2000) provide a genetic metric with which to sieve through complex deposits of human history on both micro-geographic and temporal scales. To begin to better understand how the succession and magnitude of events spanning millennia have contributed to the current genetic composition of Turkey, we have assessed patterns of Y-chromosome diversity distributed across Turkey plus Istanbul. The data illuminate numerous long-standing themes, including the Holocene expansions, contributions of agriculturalists to the European gene pool and genetic assessment of Caucasian and Central Asian gene flows.

Materials and methods

Samples

A total of 523 samples distributed amongst 90 cities, plus Istanbul, were studied. With the exception of 79 samples from cosmopolitan Istanbul, all remaining 444 samples were assigned to regions commonly distinguished by climate and rainfall (Fig. 1), as based upon stated paternal residential heritage obtained during the informed consent process. The coordinates for each of the nine regions were determined by averaging the latitude and longitude of each regional city, weighted by the number of samples in each city. The respective latitude (N) and longitude (E) by region are: (1) 40.9, 28.1; (2) 41.5, 33.7; (3) 40.8, 38.6; (4) 39.2, 40.7; (5) 37.5, 39.1; (6) 36.7, 34.5; (7) 39.3, 34.7; (8) 38.3, 28.6; (9) 41.0, 29.1. In order to test a hypothesis of Bronze Age gene flow from the Caucasus, a different geographic definition was employed. Specifically we divided the Anatolian peninsula into two sections bounded by a curve containing cities within 50 km of the Kızılırmak River and east Pontic region 3 (Fig. 1). This region comprises the historically attested Bronze Age Hattic and Kaska cultural horizons. A χ2 test was used to compare the frequencies of haplotypes across the two archeological regions. On the basis of the known high frequency of G-M201 in populations from the Caucasus (Semino et al. 2000a; Nasidze et al. 2003), an a priori hypothesis was tested comparing G-M201 frequencies of the Hattic-Kaska delineated zone to that outside the region. A total of 359 samples were from blood banks, 61 from paternity clinics and 103 from staff and students enrolled at Istanbul University. DNA was isolated from blood drawn leucocytes using Qiagen reagents and protocols.
Fig. 1

Map of sample locations. City name codes by region are: 1.01 Akyazı, 1.02 Babaeski, 1.03 Bilecik, 1.04 Bursa, 1.05 Çanakkale, 1.06 Edirne, 1.07 Erdek, 1.08 Izmit, 1.09 Kırklareli, 1.10 Sakarya, 1.11 Saray, 1.12 Sumnu, 1.13 Tekirdağ, 1.14 Yalova; 2.01 Bartın, 2.02 Düzce, 2.03 Gerze, 2.04 Karabük, 2.05 Kastamonu, 2.06 Safranbolu, 2.07 Sinop, 2.08 Uzuntaş, 2.09 Zile, 2.10 Zonguldak; 3.01 Amasya, 3.02 Artvin, 3.03 Bafra, 3.04 Bayburt, 3.05 Giresun, 3.06 Gümüşhane, 3.07 Mesudiye, 3.08 Ordu, 3.09 Perşembe, 3.10 Rize, 3.11 Samsun, 3.12 Sürmene, 3.13 Tokat, 3.14 Trabzon; 4.01 Ağrı, 4.02 Ardahan, 4.03 Bingöl, 4.04 Bitlis, 4.05 Doğubeyazıt, 4.06 Elazığ, 4.07 Erzincan, 4.08 Erzurum, 4.09 Iğdır, 4.10 Kars, 4.11 Malatya, 4.12 Muş, 4.13 Pervari, 4.14 Sarıkamış, 4.15 Tunceli, 4.16 Van; 5.01 Adıyaman, 5.02 Diyarbakır, 5.03 Gaziantep, 5.04 Kilis, 5.05 Mardin, 5.06 Siirt 5.07 Urfa; 6.01 Adana, 6.02 Antakya, 6.03 Antalya, 6.04 Burdur, 6.05 Iskenderun, 6.06 Isparta, 6.07 Mersin, 6.08 Samandağ, 6.09 Tarsus; 7.01 Ankara, 7.02 Çankırı, 7.03 Çorum, 7.04 Ereğli, 7.05 Eskişehir, 7.06 Karaman, 7.07 Kayseri, 7.08 Kırıkkale, 7.09 Kırşehir, 7.10 Konya, 7.11 Nevşehir, 7.12 Niğde, 7.13 Sivas, 7.14 Ürgüp, 7.15 Yozgat; 8.01 Afyon, 8.02 Aydın, 8.03 Denizli, 8.04 Izmir, 8.05 Manisa, 8.06 Muğla, 8.07 Sandıklı, 8.08 Simav, 8.09 Uşak; 9.01 Istanbul

Polymorphisms and haplotyping

Most polymorphisms have been previously reported (Underhill et al. 2001; Y Chromosome Consortium 2002). Details for new informative markers are summarized in Table 1. Genotyping was done using DHPLC methodology (Oefner et al. 1998), following a phylogenetic hierarchical approach. Lineages are referred to in the text by haplogroup and terminal mutation according to standardized nomenclature (Jobling et al. 2003). All 523 samples were also analyzed at ten STR loci: DYS19, DYS388, DYS390, DYS391, DYS392, DYS393, DYS389I, DYS389II (Kayser et al. 1997), DYS439 (Ayub at al. 2000) and DYSA7.2 (also called DYS461) (White et al. 1999) using 5′-labeled fluorescent primers, an ABI 3100 capillary sequencer, internal size standards and GeneScan fragment analysis software. Conversion of absolute fragment size to number of allele repeats was achieved using results obtained from sequencing both strands of control samples independently amplified with unlabeled primers. Sequencing of DYS389 is complicated since the standard genotyping primers amplify two fragments (Rolf et al. 1998). Calibration of DYS389 in control DNA was achieved by using ABCDE 5′-ccatcgacctatctgtctctattata-3′ and conventional reverse primers (Kayser et al. 1997) to amplify a single approximately 518-bp fragment encompassing five tetranucleotide motifs. Subsequent sequencing using the same amplification primers allowed precise determination of the allele repeat counts for the four traditionally reported variable tetranucleotide regions (ABCD). The DYS389II (AB fragment) repeat allele number was determined by subtracting the DYS389I (CD fragment) repeat number (Cooper et al. 1996).
Table 1

Description of Y-chromosome binary polymorphisms

Marker no.

Nucleotide change

Position (bp)

Forward 5′→3′

Reverse 5′→3′

Total size (bp)

M231

G to A

110

cctattatcctggaaaatgtgg

attccgattcctagtcacttgg

331

M241

G to A

54

aactcttgataaaccgtgctg

tccaatctcaattcatgcctc

366

M242

C to T

180

aactcttgataaaccgtgctg

tccaatctcaattcatgcctc

366

M253

C to T

283

gcaacaatgagggtttttttg

cagctccacctctatgcagttt

400

M267

T to G

148

ttatcctgagccgttgtccctg

tgtagagacacggttgtaccct

287

M285

G to C

70

ttatcctgagccgttgtccctg

tgtagagacacggttgtaccct

287

M286

G to A

129

ttatcctgagccgttgtccctg

tgtagagacacggttgtaccct

287

M287

A to T

100

ttatcctgagccgttgtccctg

tgtagagacacggttgtaccct

287

M304

A to C

421

caaagtgctgggattacagg

cttctagcttcatctgcattgt

527

M335

T to A

162

aagaaatgttgaactgaaagttgat

aggtgtatctggcatccgtta

417

M339

T to G

285

aggcaggacaactgagagca

tgcttgatcctgggaagt

517

M340

G to C

218

ccagtcagcagtacaaaagttg

gcatttctttgattatagaagcaa

386

M342

C to T

52

agagagttttctaacagggcg

tgggaatcacttttgcaact

173

M343

C to A

402

tttaacctcctccagctctgca

acccccacatatctccagg

424

M349

G to T

209

tgggattaaaggtgctcatg

caaaattggtaagccattagct

493

M359

T to C

122

cgtctatggccttgaaga

tccgaaaatgcagacttt

447

M365

A to G

246

ccttcatttaggctgtagctgc

tgtatctttagttgagatgg

274

M367

A to G

196

ccttcatttaggctgtagctgc

tgtatctttagttgagatgg

274

M368

A to C

200

ccttcatttaggctgtagctgc

tgtatctttagttgagatgg

274

M369

G to C

45

ccttcatttaggctgtagctgc

tgtatctttagttgagatgg

274

M370

C to G

166

ccttcatttaggctgtagctgc

tgtatctttagttgagatgg

274

Haplogroup diversity

STR variance, averaged over ten loci on binary haplotype backgrounds with sample sizes ≥7, was used to assess the relative level of diversity and phylogenetic substructure with geography. The F tests, based upon the ratio of χ2 distributions of average variances, were used to evaluate comparisons of average variances amongst geographic regions. STR data were also used to estimate haplogroup specific expansion times by two methods. Both approaches assume a stepwise mutation model, an average evolutionary STR mutation rate of 0.0007 per STR locus per generation (Zhivotovsky et al. 2003), whose value is based upon a generation time of 25 years. One method assumes a star-like genealogy characteristic of continuous population growth in which the variance is equal to the mutation rate per generation time the number of generations since expansion (Di Rienzo et al. 1994; Kittles et al. 1998). The other method employs a Bayesian algorithm. To estimate the time of Anatolian population expansions, we used the Markov chain Monte Carlo (MCMC) approach (Wilson et al. 1998) incorporated into the program BATWING to estimate posterior distributions for parameters of a given model of population history.

We considered a model of exponential growth from initial constant population size beginning at time Beta, with an effective population size prior distribution specified as a gamma (1, 0.0001) as used by Weale et al. (2001). The prior distribution for the STR mutation rate was specified as a gamma distribution with a mean of 7×10−4 per locus per generation and the prior distribution of the growth rate were assigned a gamma (1, 0.001). The prior distribution for Beta was assigned a broad uniform prior (0, 15). Priors were specifically chosen to be as uninformative as possible so as to minimally impact the results. We calculated the mean, median, and 2.5 and 97.5% quantiles for the posterior distributions for Beta, the estimated time of population expansion. Beta is expressed as a fraction of the initial population size multiplied here by generation time to yield values standard units of time. Calculations were based on 50,000 runs of MCMC estimator after a 20,000 run “burn in time.”

Results

A total of 69 out of 89 binary polymorphisms genotyped were informative and defined 52 distinct haplotypes. Their phylogenetic relationships and frequency distribution by geographic region are shown in Fig. 2. While none of the major haplogroups (E, G, J, R) showed significant micro-geographic structure, additional binary and STR haplotype resolution analysis revealed some distinct phylogeographic patterns. All STR data arranged by corresponding binary composition are given in Appendix table A.
Fig. 2

Phylogenetic relationships, nomenclature and haplogroup frequencies. The 65 informative markers that were haplotyped are indicated in dark font and the remaining five markers shown in italics included to provide phylogenetic context. The following 19 polymorphisms were also genotyped but not observed: M3, M18, M26, M27, M33, M37, M38, M60, M62, M68, M75, M132, M137, M163, M166, M174, M181, M210, M222

Y chromosome haplogroups and associated diversity

Haplogroup J is defined by the overarching DYS11/12f2 human endogenous retroviral polymorphism (Sun et al. 2000; Rosser et al. 2000). This polymorphism is widely distributed in Eurasia, Middle East and in North Africa (Hammer et al. 2001; Quintana-Murci et al. 2001). Although this polymorphism is recurrent (Blanco et al. 2000), occurring independently on certain D and F haplogroup lineages (Y Chromosome Consortium 2002), its association with the additional M304 transversion helps now to define the J clade with less ambiguity. The J clade was preeminent in all nine regions with an average frequency of 33%. Only one sample, with an atypical DYS388 12 repeat allele (see Appendix table A), resolved to J-M304*. All other J lineages could be assigned to two sub-clades, J1 and J2 defined by transversion mutations M267 and M172, respectively. Four new markers, M365, M367, M368 and M369 were detected on haplogroup J1-M267 samples, each associated with single representative. In contrast, haplogroup J2-M172 fractionates into nine discrete lineages, three of which — defined by M12, M47 and M67 — display informative frequency, making them useful for detecting phylogeographic patterns. Grouping all sub-lineages, the J2 frequency (24%) is evenly distributed over Turkey (χ2=7.17, df=8, P=0.52), yet the relatively high J2 related STR variance (Table 2) shows a significant decline with increasing latitude (r=−0.87, P<0.002, Spearman).
Table 2

Y-chromosome haplogroup variance and expansion times based on ten STR loci

Haplogroup

n

Variance

T (kyr)a

Beta and percent quantiles (kyr)b

Initial effective population

Mean

Median

2.5%

97.5%

size per 1,000 individuals

E3b1-M78

26

0.18

6.4

8.0

4.8

0.8

50.7

0.161 (0.045–0.630)

E3b3-M123

29

0.51

18.2

44.6

3.7

0.1

991.5

1.489 (0.140–0.569)

G-M201

57

0.40

14.3

44.6

20.3

0.3

489.6

0.161 (0.032–1.353)

G2-P15

50

0.35

12.5

31.0

15.5

0.4

372.6

0.780 (0.035–1.112)

G-M201(xP15)

7

0.42

15.0

36.0

10.4

0.1

891.4

0.570 (0.061–3.317)

I-M170

28

0.50

17.9

13.4

8.0

0.1

79.0

1.340 (0.543–6.669)

I-M170(xP37)

15

0.40

14.3

19.6

5.8

0.1

126.4

1.366 (0.198–3.764)

I1b-P37

13

0.23

8.2

19.1

9.1

0.9

101.2

0.183 (0.015–0.286)

J-M304

175

0.56

20.0

36.1

13.9

0.2

473.2

0.184 (0.036–1.306)

J1-M267

47

0.51

18.2

39.6

15.4

0.4

604.8

0.366 (0.061–1.895)

J1-M267, long DYS388 alleles

39

0.39

13.9

31.9

18.9

0.5

273.9

0.113 (0.030–0.751)

J1-M267, short DYS388 allele

8

0.25

8.9

14.3

1.4

0.0

512.4

0.385 (0.045–2.041)

J2-M172

127

0.52

18.6

17.9

14.5

2.0

78.5

0.821 (0.281–2.575)

J2-M172*

75

0.47

16.8

36.1

16.9

0.6

471.8

0.349 (0.051–1.514)

J2f-M67

33

0.33

11.8

16.4

12.5

1.7

92.0

0.285 (0.850–1.076)

J2e-M12

9

0.24

8.6

12.5

4.0

0.0

306.6

0.334 (0.052–1.833)

K2-M70

13

0.36

12.9

39.4

9.0

0.0

1,093.4

0.647 (0.063–3.889)

L-M11

22

0.41

14.6

26.3

2.4

0.0

1,044.8

1.386 (0.150–5.181)

N-M231

20

0.28

10.0

20.6

6.9

0.2

326.0

0.176 (0.023–1.062)

Q-M242

10

0.46

16.4

23.3

10.7

0.1

289.4

0.600 (0.120–3.550)

R-M207

126

0.65

23.2

15.3

13.1

2.7

63.4

0.889 (0.369–2.452)

R1b3-M269

76

0.33

11.8

23.4

17.5

1.9

127.7

0.085 (0.029–0.349)

R1a1-M17

36

0.25

8.9

4.9

4.1

0.8

23.2

0.896 (0.342–2.947)

aContinuous growth

bBayesian exponential growth, posterior probabilities are shown

The J2f-M67 lineages which occur at 6.3% frequency overall, show a significantly negative correlation with distance from region 1 in the northwest (Table 3). Moreover, regions 1, 7, 8, and 9 that clustered together in STR-based two-dimensional principal components analysis (not shown) displayed on average significantly higher (P<0.011) frequency of J2f-M67 than other regions.
Table 3

Correlations of Y-chromosome haplogroup frequencies with geography

Haplogroup

n

Spearman’s correlation coefficient

Latitude

Longitude

Distance from region 1

E3b1-M78

26

0.430

−0.430

−0.420

E3b3-M123

29

−0.717**

0.633

0.633

G-M201

57

−0.367

0.250

0.333

G-P15

50

−0.333

0.100

0.150

I-M170

28

0.550

−0.817

−0.850*

J1-M267

47

0.133

−0.067

−0.050

J2-M172

128

0.267

−0.033

−0.133

J2f-M67

33

0.667**

−0.600

−0.733**

L-M11

22

0.289

0.119

0.017

N-M231

20

−0.418

0.042

0.092

R1a1-M17

36

−0.600

0.680**

0.650

R1b3-M269

76

–0.183

−0.183

−0.100

*P<0.01 Spearman, n=9, two tailed; **P<0.05 Spearman, n=9, two tailed

Haplogroup J1-M267 occurs at 9% frequency and is also uniformly distributed across Turkey (χ2=9.02, df=8, P=0.34), with the exception of eight samples localized to the northern geographic periphery that all have an unusual “short” 13 repeat DYS388 allele, that was confirmed by sequencing. No DYS388 intermediate size 14 tandem repeat alleles were detected in any of our J1 samples. We propose that this subset of J1 lineages have a unique heritage since, besides the suggestive micro-geography, the occurrence of a short DYS388 allele on a J background is symptomatic of a deviation of the stepwise mutational process, as already proposed for this locus in a different allelic context (Nebel et al. 2001a).

The G-M201 haplogroup occurs at (57/523) 10.9% frequency and 0.40 mean STR variance consistent with its early presence in Anatolia. One major clade, G2-P15 and two less frequent sub-clades, G1-M285 and G3-M287, account for all the variation observed except for one individual from Kars region 4 who was left unresolved to the G-M201* level. The totality of G lineages do not show micro-geographic structure on the basis of the criteria used to describe the nine geographic regions (χ2=9.21, df=8, P=0.33), but they do significantly correlate (χ2=4.11, P<0.043) when evaluated on the basis of the archeological boundaries of the Bronze Age Hattic and Kaska cultures (Fig. 1). In addition, variances of G2-P15 lineages are correlated with longitude (r=−0.72, P<0.03, n=9) showing higher variances towards western Anatolia. The distinctive G1-M285 lineages are restricted to region 3.

Haplogroup I-M170 is a major lineage cluster largely restricted to populations of Europe (Semino et al. 2000a). Despite its relatively low average frequency (5.3%) in Turkey five major sub-clades were detected. The I-M170 chromosomes are more localized towards the west and show a significant correlation (r=−0.82) with longitude and geographic distance from region 1 (r=−0.85), the European pole of Turkey (Table 3). While haplogroup I-M170 displays overall with high STR variance (Table 2) the I1-P37 sub-clade accounts for almost one half of the lineages overall, but it shows significantly lower variance relative to other I lineages (Table 2).

Haplogroup E3b-M35 occurs at an overall 10. 7% frequency with E3b1-M78 and E3b3-M123 accounting for all E representatives except a single E3b2-M81 chromosome. Although E3b1-M78 and E3b3-M123 occur at similar frequencies (5.0% and 5.5%, respectively) their associated mean STR variances (Table 2) are significantly different [F (280,250) = 2.83, P<0.01]. The more diverse Turkish E3b3-M123 lineages are correlated with latitude (Table 3).

Haplogroup R-M207 lineages occur at 24.1 frequency on the whole with the majority belonging to the R1-M173 sub-clade (Fig. 2). Only one R1-M173* lineage was observed in eastern region 5. All but one (R1c-M343) of the remaining R1-M173 associated lineages allocate to R1a1-M17 and R1b-P25 sub-clades with R1b3-M269 being preponderate at 14.5% overall in Turkey. Although R1b3-M269 lineages are found throughout Europe at considerable frequency (Cruciani et al. 2002), no additional PCR compatible binary markers are currently known that show additional informative subdivision within this clade. However, two TaqI haplotypes ht15 and ht35 associated with the complex RFLP 49a,f locus, are associated with R1b3-M269 lineages. The 49a,f ht15 form is rare in Turkey but common in Iberia (Semino et al. 1996), while 49a,f ht35 representatives are distributed across Europe (Torroni et al. 1990; Santachiara-Benerecetti et al. 1993; Semino et al. 2000b) and occurs at ~10% in the Balkan region (Santachiara-Benerecetti, personal communication). In an attempt to better understand the affinity of the frequent Turkish R1b3-M269 lineages relative to other regions, we have analyzed the same battery of STR loci in 52 additional R1b3-M269 defined samples from Iberia, the Balkans, Iraq, Georgia, and Turkey that were previously determined to be 49a,f ht15 or ht35, as well as an additional 59 European R1b3-M269 derived samples. STR haplotype data for these 111 samples are given in Appendix table B. Principal component analysis of all 187 R1b3-M269 samples at ten STR loci variables reveals distributions coinciding with samples of known 49a,f ht15 and ht35 constitution (Fig. 3). Most of the Turkish samples group with the Balkan and the Caucasian 49a,f ht35 samples, while the West European samples associate with the 49a,f ht15 samples. The variance of 49a,f ht35 related chromosomes are lower in the Balkan, Caucasian and Iraqi representatives than those in Turkey (Table 4). Similarly, the variance is higher in Iberia than in Western Europe. The decreasing diversity radiating from Turkey towards Southeast Europe, Caucasus and Mesopotamia approximates similar results from Iberia tracing the re-colonization of Northwest Europe by hunter-gatherers during the Holocene as suggested by others (Torroni et al. 1998; Semino et al. 2000a; Wilson et al. 2001).
Fig. 3

Plot of 187 R1b3-M269 derived lineages against values for the initial two principal components for ten microsatellite loci variables. The first component accounts for 19% of the total variance, whereas the second component accounts for 16%. Samples whose p49a,f ht15 (n=13) or ht35 (n=39) status is known are indicated in red and yellow, respectively. Geographic areas include: Iberia (n=27), W. Europe (n=45), Turkey (n=79), Balkans (n=21), Georgia/Iraq (n=15). W. Europe includes France, Italy, Germany, Norway; Balkans includes Albania and Greece. Large symbols represent the means for the eight groups. The one Iberian ht15 outlier reflects the influence of an unusual DYS388 allele. Both M269 and DYS388 results for this sample were confirmed by sequencing

Table 4

Variance of R1b3-M269 and TaqI p49a,f Ht15, Ht35 STR haplotypes

Populationa

n

Variance

Turkey

79

0.31

Iberia

27

0.24

W. Europe

45

0.22

Georgia

15

0.22

Balkan

21

0.18

p49a,f-Ht35

39

0.19

p49a,f-Ht15

13

0.18

aPopulations grouped as given in Fig. 3

In Turks R1b3-M269 and R1a1-M17 occur at 14.7% and 6.9%, respectively. In addition R1b3-M269 related YSTR variance is significantly higher than that of R1a1-M17 [F (750,350) = 1.32, P<0.01). While no micro-geographic substructure is detected in Turkey for R1b3-M269, the frequency of R1a1-M17 is higher in Eastern Turkey and its distribution significantly correlates with longitude across the nine regions (Table 3). The majority of L-M11 chromosomes occur in the most eastern regions 3 and 4 (χ2=17.99, df=8, P<0.021) and also have high levels of variance (Table 2).

Discussion

Under an assumption of a negligible role of natural selection on Y-chromosome haplogroup distribution, the assessment of background STR variance can provide insights into haplogroup subdivision, size fluctuation, directionality of distribution and relative chronology amongst haplogroups. The haplogroup-specific variances may reflect potential associations with Upper Paleolithic, Holocene and agriculturalist processes. Although the occurrence of early agriculture in the Near East is almost contemporaneous with the onset Holocene climatic warming, the consequences of growth and migration specifically due to agriculture are likely to be more recent.

Haplogroup J and the transition to agriculture

Although the entire J-M304 clade demonstrates a large microsatellite variance that under a continuous growth model dates to around 20 kyr, consistent with the LGM, the BATWING exponential growth model reveals a more recent post-LGM expansion (13.9 kyr). This secondary expansion originates from a low effective population size (n=184) and may indicate that the J clade in Turkey began to participate in demographic expansions during the onset of sedentism in Anatolia and the Levant; e.g., the Natufians (Bar-Yosef 1998). Previously, J clade representatives would have been accumulating STR diversity via genetic drift within various small groups of mobile hunter-gathers during the LGM. We detected a significant reduction of variance of J2-M172 northwards in Turkey This latitudinal trend could be a consequence of an Upper Paleolithic presence of J2-M172 in southern Anatolia and its subsequent spread north and west during the Holocene likely catalyzed by the transition to agriculture (Ammerman and Cavalli-Sforza 1984; Underhill 2002). The northward gradient in J2-M172 variance is consistent with the archeological evidence that agro-pastoral economies of Northwest Anatolia were derived from the Çatal Höyük area in region 7 (Thissen 1999). The presence of J2-M172 related lineages successfully predicted the distribution of both Neolithic figurines and painted pottery attributed to agriculturalists (King and Underhill 2002). The Upper Paleolithic sites in Turkey (Öküzini cave, region 6) have been dated to 17,800 BC and suggest a continuous occupation into the subsequent Neolithic period (Kuhn 2002) while Neolithic sites are considerably fewer in Central and Northern Turkey (Roberts 2002). The J1-M267 and J2-M172 distributions in the Near East and Europe can be inferred from previously reported DYS388 data associated with Eu10 and Eu9, respectively (Semino et al. 2000a; Nebel et al. 2001b; Malaspina et al. 2001; Al-Zahery et al. 2003). While both J1 and J2 are found in the Near East, haplogroup J1-M267 typifies East Africans and Arabian populations, with a decreasing frequency northwards. Alternatively the majority of J lineages in Europe are J2-M172 that radiated from the Levant, coherent with the distributions of mitochondrial J, K, T1 and pre-HV clades (Richards et al. 2002).

Although we currently lack additional binary polymorphisms capable of defining further informative subdivision within haplogroup J1-M267, the unusual short DYS388 13 repeat allele lineage provides a proxy. These peculiar chromosomes distribute along the northern tier of Turkey. While this lineage has not been observed in Greece, it has been detected in Georgia (Semino, unpublished results), suggesting Black Sea coastal gene flow. A few lineages with potentially similar affinity have been observed scattered throughout the Middle East (Nebel et al. 2001b), although it is not possible to distinguish their affinity to haplogroup J-M304* or J1 since M267 data are unavailable. When the DYS388 “short” allele representatives are excluded on the assumption that they have a common origin, the residual assemblage of J1-M267 DYS388 “long” allele lineages contain numerous haplotypes including both the purported “Cohen” and “Arab” modal haplotypes (Thomas et al. 2000; Nebel et al. 2002). The similarity of variances associated with the two counterbalancing J1 and J2 sub-clades suggests an enduring common demography. At this level of molecular resolution, the data do not distinguish between agricultural and pastoral domestic livelihoods despite the observation that lifestyle differences exist (Khazanov 1984). Notably, nomads are often more endogamous and participate in transhumant seasonal migrations (Cavalli-Sforza et al. 1994).

The J2f-M67 clade is localized to Northwest Turkey. It is well known that during this period, Northwest Anatolia developed a complex society that engaged in widespread Aegean trade referred to as “Maritime Troia culture,” involving both the western Anatolian mainland and several of the large islands in the eastern Aegean, Chios, Lemnos and Lesbos (Korfmann 1996). Another J2 component is intriguing. Although J2e-M12 lineages occur at low frequencies, they are widely distributed in the Middle East (Scozzari et al. 2001) and India (Kivisild et al. 2003), as well as in Saami from Kola, Russia (Raitio et al. 2001). By comparing data sets (Malaspina et al. 2001; Scozzari et al. 2001) we deduced that J2e-M12 lineages are distinctive from all other J2-M172 lineages on the basis of complex DYS413 and YCAII dinucleotide STRs. In corroboration we confirmed by sequencing the simple repeat locus DYSA7.2 that J2e-M12 is exclusively associated with shorter seven- or eight-tetranucleotide repeat alleles in Turkey. The considerable diversification observed in the J clade as exemplified by high variance of J2-M172 and a J-M304* lineage in southeastern Anatolia, is consistent with the early onset of post glacial sedentism found in the archeological record of Anatolia and the Levant (Bar-Yosef 1998).

G-M201 and post ice-age expansions in Europe

Although recurrent mutation can occur in the complex 49a,f RFLP polymorphic system the TaqI ht8 restriction profile occurs only within haplogroup J and G lineages (Semino et al. 2000a) suggesting common ancestry. The overlap of J and G lineages with geography bolsters this putative affinity. The apparent scarcity of Upper Paleolithic sites in Anatolia (Kuhn 2002) and the considerable diversification of haplogroup G and J ancestors is consistent with a Upper Paleolithic/Mesolithic Middle East/Mesopotamian origin and the subsequent gradual proliferation of agriculturalists, including their presence (e.g., Çatal Höyük, region 7) during the early Pre-Pottery Neolithic B period (~9,500 BP). Haplogroup G-M201 lineages occur at ~30% in Georgia (Semino et al. 2000a) and the north Caucasus (Nasidze et al. 2003). Haplogroup G-M201 also occurs in Southeast Europe and the Mediterranean (Semino et al. 2000a) and in Iraq (Al-Zahery et al. 2003). In a material context, the Bronze Age Hattic and Kaska cultural region in Anatolia (Fig. 1) has affinity to the Maikop culture of the Caucasus and linguistic affinities to the northwest Caucasian languages (Renfrew 1998). Populations that speak such languages show a high frequency of G-M201 (Nasidze et al. 2003). Haplogroup G2-P15 is the most frequent (9%) G sub-clade in Turkey. G2-P15 lineages have been observed throughout the Middle East with a maximum of 19% in the Druze (Hammer et al. 2000) and an average of 5% in Italy and Greece (Di Giacomo et al. 2003). The expansion time estimates for G2-P15 closely approximate those predicted for R1b3-M269.

Role of R1b3-M269 in the Aurignacian and Neolithic eras

Haplogroup R1b3-M269 is one of the most common binary lineages observed in Turkey. The phylogenetic and spatial distribution of its equivalent in Europe (Cruciani et al. 2002), the R1-M173 (xM17) lineage for which considerable data exist (Semino et al. 2000a; Wells et al. 2001; Kivisild et al. 2003) implies that R1b3-M269 was well established throughout Paleolithic Europe, probably arriving from West Asia contemporaneous with Aurignacian culture. Although the phylogeographic pattern of R1b3-M269 lineages in Europe suggest that R1-M173* ancestors first arrived from West Asia during the Upper Paleolithic, we cannot deduce if R1b3-M269 first entered Anatolia via the Bosporus isthmus or from an opposite eastward direction. However, archeological evidence supports the view of the arrival of Aurignacian culture to Anatolia from Europe during the Upper Paleolithic rather than from the Iranian plateau (Kuhn 2002).

Haplogroup R1b3-M269 occurs at 40–80% frequency in Europe and the associated STR variance suggests that the last ice age modulated R1b3-M269 distribution to refugia in Iberia and Asia Minor from where it subsequently radiated during the Late Upper Paleolithic and Holocene. The R1b3-M269 related, but opposite TaqI p49a, f ht 15 and ht35 distributions reflect the re-peopling of Europe from Iberia and Asia Minor during that period. The R1b3-M269 variances and expansion time estimates of Iberian and Turkish lineages are similar to each other (Table 2) but higher than observed elsewhere (Table 4). Low variances for R1b3-M269 lineages have also been reported for Czech and Estonian populations (Kivisild et al. 2003).

In contrast, the R1-M173 related but offsetting clade R1a1-M17, is frequent (30–60%) in East Europe, Central Asia, and Northwest India (Semino et al. 2000a; Wells et al. 2001; Passarino et al. 2001; Kivisild et al. 2003). This pronounced R1-M173 related Y-chromosome substructure contrasts to the observed uniform frequency spectrum of the major mitochondrial DNA haplogroups in Europe. The higher frequency of R1a1-M17 lineages in eastern Turkey is consistent with an entry into Anatolia via the Iranian plateau where the associated variance is appreciably higher (Quintana-Murci et al. 2001). The most common R1a1-M17 haplotype in Armenia (Weale et al. 2001) matches the most common in Turkey.

Haplogroup I-M170 indicates gene flows from Croatia

The phylogeography and high associated variance of I-M170 is consistent with an in situ European origin of M170 in the Balkans (Semino et al. 2000a), possibly near the Dinaric Mountain chain in Croatia where it has been observed at the highest frequency known so far (Barac et al. 2003). I-M170 lineages radiated both towards north central Europe and into western Turkey. Comparison of STR haplotypes indicates that the Dinaric modal haplotype is associated with the I-P37 lineages observed in Turkey. Molecular analyses of I-M170 group lineages at equivalent resolution in modern day Bulgaria, Croatia and Greece will be required to better understand the phylogeography of I-M170 sub-clades.

Haplogroup E3b and Neolithic expansions

While both E3b1-M78 and E3b3-M123 occur at similar frequency in Turkey, the variance of the former is considerably lower than the latter suggesting either temporal or effective population size differences. The prevalence of haplogroup E (xM2) chromosomes in northern Egypt may reflect the source of non-African E3b lineages (Manni et al. 2002). Haplogroup E3b1-M78 haplotypes typify European lineages (Semino, unpublished) and have expansion dates consistent with expansion of agriculturalists (Table 2). Haplogroup J2-M172 lineages likely reflect the introduction of agriculture to India from the Middle East (Kivisild et al. 2003). However, the absence of E3b lineages in India supports the inference that the higher variance and older expansion dates for E3b3-M123 in Turkey do not reflect an earlier dispersal, but rather multiple founders with more associated diversity.

The spread of haplogroup L-M11 lineages is largely restricted to populations of the south Caucasus (Weale et al. 2001), Middle East (Nebel et al. 2001b), Pakistan (Qamar et al. 2002) and India (Kivisild et al. 2003). Interestingly Turkish L lineages lack the M27 mutation that characterizes Indian and Pakistani L lineages. Although no M27 data exist for Armenians, the haplogroup L modal haplotype of the six STR loci in Armenians haplotype (Weale et al. 2001) matches the most common Turkish counterpart. An attempt to interpret other informative lineages in Turkey such as I1a-M253, J2a-M47, J2f-M67, K2-M70, N-M231 is premature until they are adequately surveyed elsewhere.

Minor genetic influence of Turkic speakers

Various estimates exist of the proportion of gene flow associated with the arrival of Central Asian Turkic speaking people to Anatolia. One study based on analyses of six STR loci in 88 Y-chromosomes from Turkey suggested only a 10% contribution (Rolf et al. 1999). Another study suggests roughly 30% based upon mtDNA control region sequences and one binary and six STR Y-chromosome loci analyzed in 118 Turkish samples (Di Benedetto et al. 2001). While it is likely that gene flow between Central Asia and Anatolia has occurred repeatedly throughout prehistory, uncertainties regarding source populations and the number of such episodes between Central Asia and Europe confound any assessment of the contribution of the 11th century AD Oghuz nomads responsible for the Turkic language replacement. These new Y-chromosome data provide candidate haplogroups to differentiate lineages specific to the postulated source populations, thus overcoming potential artifacts caused by indistinguishable overlapping gene flows. The best candidates for estimations are Asian-specific haplogroups C-RPS4Y (Wells et al. 2001; Karafet et al. 2001; Zerjal et al. 2003) and O3-M122 (Su et al. 2000). These lineages occur at 1.5% in Turkey (8/523). Using Central Asian Y-chromosome data from either 13 populations and 149 samples (Underhill et al. 2000) or 49 populations and 1,935 samples (Wells et al. 2001) where these diagnostic lineages occur at 33% and 18%, respectively, their estimated contributions range from 0.0153/0.329×100=4.6% to 0.0153/0.180×100=8.5%. During the Bronze Age the population of Anatolia expanded, reaching an estimated level of 12 million during the late Roman Period (Russell 1958). Such a large pre-existing Anatolian population would have reduced the impact by the subsequent arrival of Turkic speaking Seljuk and Osmanlı groups from Central Asia. Although the genetic legacy of Anatolia remains somewhat inchoate, our excavations of these new levels of shared Y-chromosome heritage and subsequent diversification provide new clues to Anatolian prehistory, as well as a substantial foundation for comparisons with other populations. Our results demonstrate Anatolia’s role as a buffer between culturally and genetically distinct populations, being both an important source and recipient of gene flow.

Acknowledgements

We are grateful to all the donors for providing DNA samples for this study. This study was supported by NIH grants GM28428 and GM 55273 to L.L.C-S and by Progetti Ricerca Interesse Nazionale 2002 and CNR “Beni Culturali” to O.S. We thank C. Edmonds for regression analyses associated with DYS389 calibration.

Copyright information

© Springer-Verlag 2003