Dispersals of the Siberian Y-chromosome haplogroup Q in Eurasia

The human Y-chromosome has proven to be a powerful tool for tracing the paternal history of human populations and genealogical ancestors. The human Y-chromosome haplogroup Q is the most frequent haplogroup in the Americas. Previous studies have traced the origin of haplogroup Q to the region around Central Asia and Southern Siberia. Although the diversity of haplogroup Q in the Americas has been studied in detail, investigations on the diffusion of haplogroup Q in Eurasia and Africa are still limited. In this study, we collected 39 samples from China and Russia, investigated 432 samples from previous studies of haplogroup Q, and analyzed the single nucleotide polymorphism (SNP) subclades Q1a1a1-M120, Q1a2a1-L54, Q1a1b-M25, Q1a2-M346, Q1a2a1a2-L804, Q1a2b2-F1161, Q1b1a-M378, and Q1b1a1-L245. Through NETWORK and BATWING analyses, we found that the subclades of haplogroup Q continued to disperse from Central Asia and Southern Siberia during the past 10,000 years. Apart from its migration through the Beringia to the Americas, haplogroup Q also moved from Asia to the south and to the west during the Neolithic period, and subsequently to the whole of Eurasia and part of Africa. Electronic supplementary material The online version of this article (doi:10.1007/s00438-017-1363-8) contains supplementary material, which is available to authorized users.


Introduction
In recent decades, the human Y-chromosome has proven to be a powerful tool for tracing the paternal history of human populations and genealogical ancestors. The human Y-chromosome haplogroup Q (also named Q-M242 in accordance with its defining mutation) probably originated in Central Asia and Southern Siberia during the time period of 15-25 KYA (1000 years ago) (Karafet et al. 2002(Karafet et al. , 2008Bortolini et al. 2003;Seielstad et al. 2003), then subsequently diffused in the eastward, westward and southward directions (Zhong et al. 2011;Di Cristofaro et al. 2013;Sandoval et al. Abstract The human Y-chromosome has proven to be a powerful tool for tracing the paternal history of human populations and genealogical ancestors. The human Y-chromosome haplogroup Q is the most frequent haplogroup in the Americas. Previous studies have traced the origin of haplogroup Q to the region around Central Asia and Southern Siberia. Although the diversity of haplogroup Q in the Americas has been studied in detail, investigations on the diffusion of haplogroup Q in Eurasia and Africa are still limited. In this study, we collected 39 samples from China and Russia, investigated 432 samples from previous studies of haplogroup Q, and analyzed the single nucleotide polymorphism (SNP) subclades Q1a1a1-M120, Q1a2a1-L54, Q1a1b-M25, Q1a2-M346, Q1a2a1a2-L804, Q1a2b2-F1161, Q1b1a-M378, and Q1b1a1-L245. Through NETWORK and BATWING analyses, we found that the subclades of haplogroup Q continued to disperse from Central Asia and Communicated by S. Hohmann.

Electronic supplementary material
The online version of this article (doi:10.1007/s00438-017-1363-8) contains supplementary material, which is available to authorized users. 2013; Liu et al. 2014;Rasmussen et al. 2014). Haplogroup Q has several subclades defined by single nucleotide polymorphisms (SNPs), and it reaches its highest frequency of 70-100% in the Americas (Bortolini et al. 2003;Seielstad et al. 2003;Zhong et al. 2011;Rasmussen et al. 2014). Although the diversity of haplogroup Q in the Americas has been studied in detail Toscanini et al. 2011Toscanini et al. , 2016Jota et al. 2011;Malyarchuk et al. 2011;Dulik et al. 2012b;Battaglia et al. 2013;Lardone et al. 2013;Melton et al. 2013;Regueiro et al. 2013;Noguera et al. 2014;Sala and Corach 2014;Torres et al. 2015), investigations on the diffusion of haplogroup Q in Eurasia and Africa are still limited. Consequently, we studied samples of haplogroup Q in Eurasia to explore how it expanded from Central Asia and Southern Siberia during the Neolithic period.
The ancestors of present-day Native Americans migrated to the Americas from Siberia via the Beringia around 16 KYA (Raghavan et al. 2015;Llamas et al. 2016). Q1a2a1-L54 and its subclade Q1a2a1a1-M3 are the two predominant subclades of haplogroup Q found on both sides of the Bering Strait. Q1a2a1-L54 has spread throughout Northern Asia, the Americas, and Western and Central Europe (Raff and Bolnick 2014;Rasmussen et al. 2014). An ancient individual of the Clovis culture belonged to Q1a2a1-L54 (xQ1a2a1a1-M3) (O'Rourke and Raff 2010; Rasmussen et al. 2014). Q1a2a1a1-M3, one of the most thoroughly studied subclades within haplogroup Q, is frequent both in the Chukotka Peninsula of Siberia (close to Alaska) and the Americas (Lell et al. 2002). Previous studies indicated that Q1a2a1a1-M3 migrated from Siberia to the Americas and partially returned to Siberia (Hammer et al. 1997;Lell et al. 1997;Bortolini et al. 2003;Pakendorf et al. 2007). The estimated time of Q1a2a1a1-M3 is 13-22 KYA (Dulik et al. 2012a). Q1a2a1a1a-M19, a subclade of Q1a2a1a1-M3, remained in Southern America and has a similarly diversified pattern with its upstream lineage. The age of Q1a2a1a1a-M19 is approximately 7-8 KYA (Bortolini et al. 2003;Jota et al. 2011).
Haplogroup Q has also appeared in other parts of the world. For instance, an ancient DNA study of a Saqqaq individual in Greenland suggests that haplogroup Q1a-MEH2 was frequent in Siberian and Native American populations (Karafet et al. 2008;Rasmussen et al. 2010;Raghavan et al. 2015). A few subclades of haplogroup Q have been identified in the Comoros population in Africa (Q1a2-M346) and the Polynesian islands in Oceania (Q1a2a1a1c-M199) (Hurles et al. 2003;Msaidie et al. 2010).
Nowadays, the distribution of haplogroup Q in the Americas has been studied thoroughly, but we know little about its dispersals on western and southern routes. In this study, we present an analysis of some SNP subclades of haplogroup Q, including Q1a1a1-M120, Q1a2a1-L54, Q1a1b-M25, Q1a2-M346, Q1a2a1a2-L804, Q1a2b2-F1161, Q1b1a-M378, and Q1b1a1-L245. Based on NETWORK and BATWING analyses of haplogroup Q, we were able to better understand its dispersals on western and southern routes, and their impacts on Eurasian populations.

Ethic statement
This study was conducted after the approval of the Ethical Committee of the School of Life Sciences, Fudan University (Shanghai, China) and the ethical committee of the Lomonosov Moscow State University (Moscow, Russia). All donors of samples were completely informed and signed informed consent forms before sample collection.

Population samples
In this study, a total of 471 unrelated male samples were analyzed. We collected blood samples of 1757 healthy and unrelated volunteers from five populations in China, including 700 Hui, 64 Bao-An, 109 Dong-Xiang, 90 Li-Qian, and 794 Shao-Xing individuals. In addition, we collected saliva samples of 30 healthy and unrelated volunteers from 3 populations in Russia, including 4 Enets, 19 Ket, and 7 Selkup individuals. After genotyping all samples, we confirmed that 16 samples of China and 23 samples of Russia belonged to haplogroup Q, which were further investigated in this study. Furthermore, data from previous studies were also analyzed (Bailliet et al. 2009;Zhong et al. 2011;Lacau et al. 2012;Dulik et al. 2012;Di Cristofaro et al. 2013;Sandoval et al. 2013;Varzari et al. 2013;Hollard et al. 2014;Liu et al. 2014; Family Tree DNA). The populations were categorized in accordance with the location of residence as follows: from Gansu province of China: Bao-An, one individual from Ji-Shi Mountain; Dong-Xiang, two individuals from Dong-Xiang county, Hui Autonomous Prefecture of Lin-Xia; Li-Qian, four individuals from Yong-Chang county, Jin-Chang city; from Zhejiang Province of China: Shao-Xing, nine individuals from Shao-Xing city. In the Krasnoyarsk Region of Russia: Enets-two individuals from Potapovo; Ket-one individual from each of Farkovo, Sulomai/Bor, Sumarokovo, Turukhansk, and Verkhneimbatsk, two individuals from each of Bakhta, Baklanikha and Kellog, and five individuals from Sulomai; Selkup-three individuals from Farkovo, and two individuals from Turukhansk. These three populations are considered minorities in Russia according to the 2002 All-Russia Population Census (ESM_3). Enets (named Entses in ESM_3) has 237 individuals; Ket has 1494 individuals; Selkup has 4249 individuals.

Y-chromosome markers
Genomic DNA was extracted from the blood samples using the DP-318 Kit (Tiangen Biotechnology, Beijing, China), and the DNA extraction protocol for the saliva samples was adapted from the high-salt DNA extraction method (Quinque et al. 2006). The samples were typed as the most recent Y-chromosome phylogenetic tree (ISOGG 2017). The selected samples belonged to several subclades of haplogroup Q.

Statistical analyses
Networks of Y-chromosomal STR data were constructed by the reduced-median method using NETWORK v. 5.0.0.1 (http://www.fluxus-engineering.com) with haplogroups We used the Markov chain Monte Carlo (MCMC) approach (Wilson et al. 2003) incorporated into the program BATWING to estimate the time to the most recent common ancestor (TMRCA) and the expansion time of the aforementioned Q subclades. Time estimates for subclades of haplogroup Q were made using seven to fifteen of the STRs listed above. A model of exponential growth from an initially constant-sized population was employed in BATWING for obtaining the time estimates. Four sets of widely used Y-STR mutation rates were applied in the time estimates as Wei et al. (2013): evolutionary mutation rate (EMR) (Zhivotovsky et al. 2004), two observed genealogical mutation rates (OMRB and OMRS) (Shi et al. 2010;Burgarella and Navascués 2011), and a genealogical mutation rate adjusted for population variation using a logistic model (lmMR) (Wilson et al. 2003). A generation time of 30 years was used to produce a time estimate in years (Tremblay and Vézina 2000). We applied weakly informative prior distribution parameters in BATWING estimations to analyze populations individually. For the initial effective population size (N), we used a broad prior gamma (1, 0.0001) (mean = 10,000, SD = 10,000). For population growth rate per generation (α), we also used the broad prior distribution gamma (2, 400) (mean = 0.005, SD = 0.0035). For the time in coalescent units when exponential growth (β) began we used gamma (2, 1) (mean = 2, SD = 1.41) (Xue et al. 2006). A total of 10 4 samples of the program's output representing 10 6 MCMC cycles were taken after discarding the first 3 × 10 3 samples as "burn-in" (Xue et al. 2006), and convergence was confirmed by examining longer runs for all populations and finding the same posterior distributions. The TMRCA was calculated using the product of the estimated population size N and the height of the tree T (in coalescent units).
A contour map for the frequencies of haplogroups Q-M242 was generated using the Kriging procedure with the aid of the Golden Software Surfer 11 (Golden Software Inc., CO, USA) (Fig. 1). Since the frequency data were obtained from many sources, the identified subclades of haplogroup Q were different. To show all frequencies in one figure, we integrated the frequencies of different subclades into frequencies of Q-M242. The raw frequency data and references are shown in ESM_2.

Worldwide distribution of haplogroup Q-M242
We calculated the frequencies of our samples and collected the frequency data from previous studies (ESM_2). As can be seen in Fig. 1, the frequencies of haplogroup Q-M242 are low in most of the world, except for the Americas and a small part of Siberia, which matches previously published observations on the distribution of haplogroup Q (Balanovsky et al. 2017). Moreover, we represented the migration routes of haplogroup Q-M242 based on our results and previous studies (Fig. 1, ESM_2). We also marked the main distribution regions of the subclades studied in this research (Fig. 1). We have constructed a phylogenetic tree within haplogroup Q to easily identify the downstream subclades (Fig. 2).

The network of haplogroup Q subclades
To reveal the detailed structures for subclades of haplogroup Q, we conducted a network analysis combining the SNP and the STR haplotype data for 471 individuals (Fig. 3) The network of Q1b1a1-L245 had a star-like shape of Jewish samples and a small amount of European and Western Asian samples. We did not discuss the origins and migrations of samples from the Americas because we focused on the dispersals of haplogroup Q in Eurasia and just used samples from the Americas to construct the network.

Time estimates for haplogroup Q
We used BATWING to estimate the TMRCA and the expansion time for the subclades of haplogroup Q. As seen in Table 1, the three genealogical mutation rates had approximately similar results, while using the evolutionary mutation rate resulted in a much older TMRCA. The genealogical mutation rates were more reliable when we analyzed a large number of loci and closely related individuals, whereas the evolutionary mutation rate tended to be more effective for estimates on a smaller number of loci and genetically distant individuals Wang and Li 2015). Since we used from seven to fifteen loci in the time estimates, and the used populations belonged to the same subclades of haplogroup Q, we decided to use the results of the three genealogical mutation rates.
Both Q1a1b-M25 and Q1a2-M346 subclades were frequent in Turkic-speaking populations, and their time estimates were at approximately 3-5 KYA (ESM_1; Table 1). According to Fig. 3 (Yunusbayev et al. 2015). Therefore, we suggested that Q1a1b-M25 and Q1a2-M346 probably migrated with Turkic nomads from Southern Siberia to most parts of Eurasia. A few Q1a1b-M25 and Q1a2-M346 samples in Mongolic-speaking populations probably indicated that Turkic nomads had overlapped with Mongolic-speaking populations when they lived in the present Mongolian territory (Yunusbayev et al. 2015). An ancient DNA study showed that the Hungarians probably originated from Central Asia-Southern Siberia at approximately 4 KYA (Neparáczki et al. 2016), which was consistent with our time estimates (Table 1). Therefore, we proposed that Q1a1b-M25 and Q1a2-M346 had migrated from Central Asia-Southern Siberia to Central Europe at least 4 KYA. Three individuals of Africa (the Comoros Islands) that belonged to Q1a2-M346 reaffirmed that Middle Eastern populations had a genetic influence on the Comoros Islands (Gourjon et al. 2011).
Subclades Q1a2a1a2-L804 and Q1a2b2-F1161 were the downstream of Q1a2-M346 (Fig. 2), both of which mainly distributed in Western and Northern Europe (Fig. 3). Q1a2a1a2-L804 arrived in Western and Northern Europe as early as 5-7 KYA (Table 1). Ancient DNA studies showed that first European farmers migrated from Central Europe to Western and Northern Europe between 5 and 7.5 KYA (Haak et al. 2005(Haak et al. , 2010Bramanti et al. 2009;Malmström et al. 2009). Therefore, we supposed that Q1a2a1a2-L804 had spread from Central Europe to Western and Northern Europe with European early Neolithic farmers. The time estimate for Q1a2b2-F1161 was one thousand years later than its upstream clade Q1a2-M346 (Table 1), which seemed to be unrelated to the Neolithic transition of Europe (Haak et al. 2010). Since Q1a2-M346 spread across Europe at that time, it probably brought Q1a2b2-F1161 to Western and Northern Europe, and even to Western and Southern Asia (Khurana et al. 2014;Yunusbayev et al. 2015).
Subclades Q1b1a-M378 and Q1b1a1-L245 were correlated with the Jewish people, both of which probably represented that some of the Jewish Diaspora populations had expanded into Europe within historical times (Table 1; Fig. 3). As seen in Fig. 3, the central clusters of Q1b1a-M378 and Q1b1a1-L245 mainly consisted of samples from Central and Eastern Europe. The results reaffirmed that some Jewish Diaspora populations had migrated from Central and Eastern Europe, and finally settled in other parts of Europe (Nogueiro et al. 2010;Zoossmann-Diskin 2010). Previous Y-chromosome studies showed that haplogroups J, R and Q3a1 had certain proportions in Jewish populations and spread over Europe (Nogueiro et al. 2010;Chaubey et al. 2016;Balanovsky et al. 2017). Subclades Q1b1a-M378 and Q1b1a1-L245 probably spread over Europe with haplogroups J, R and Q3a1. The Q1b1a-M378 samples from Southern Asia might represent the descendants of Ashkenazi Jewish populations because its upstream haplogroup Q-P36 was regarded as minor Ashkenazi Jewish founding lineages in Southern Asia (Lee et al. 2014).
Our study of the human Y-chromosome haplogroup Q in Eurasia revealed a clear pattern of its migration routes during the past 10,000 years, especially in Han Chinese, Yeniseian-, Samoyedic-, Turkic-speaking and Jewish populations. It is clear that a higher resolution database will be helpful to draw more conclusions on the origins, migrations, and ethno-linguistic affiliations of haplogroup Q.
Informed consent Informed consent was obtained from all individual participants included in the study.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.