Background

In the past, markers on the non-recombining region of the Y chromosome (NRY) have been used extensively as a male complement to mtDNA to study the colonization or migration histories within the different regions of the world. Y-chromosomal haplogroup (Y-HG) Q, defined by either of the binary markers P36/MEH2 [1] or M242 [2], has been suggested to originate in Asia and represent recent founder paternal Native American radiation into the Americas. In Eurasia, haplogroup Q chromosomes have been reported with the highest frequency in Siberian populations, distributed primarily across Northwest and Northeast Siberia with the vast majority in only two Siberian populations, the Kets (93.8%) and the Selkups (66.4%) [3]. Other population groups from the region bearing this haplogroup include, Kyrgyz (2%), Kazak (6%), Kallar (1%), Shiraz (8%), Bartangi (13%), Korean (2%), Yagnobi (3%), Esfahan (6%), Turkmen (10%), Dungan (8%), Tuvinian (17%), Uzbek/Kashkadarya (5%), Shugnan (11%), Uzbek/Bukhara (2%), Uzbek/Surkhandarya (4%), Yadhava (3%), Kazan Tatar (3%), Tajik/Samarkand (5%), Uighur (5%), Uzbek/Khorezm (9%), Uzbek/Tashkent (14%), Arab/Bukhara (14%), Uzbek/Fergana Valley (5%) and Uzbek/Samarkand 7%) [2]. Interestingly, Y-HG Q is the youngest paternal haplogroup observed with less frequent subgroups which are geographically restricted. These have been delineated into Q1, Q2 and Q3 subgroups defined by biallelic markers M120, M25/M143 and M3, respectively[1]. Recently, a novel subgroup Q4 was identified, defined by the bi-allelic marker M346, representing HG Q (0.41%, 3/728) in Indian populations [4].

With very less details of HG Q in Asia, especially India, it was pertinent to explore the status of the Y-HG Q in Indian populations to gather an insight to determine the extent of diversity within this region.

Results and Discussion

In the present study, we screened 630 samples belonging to different regions of India and observed 2.38% (15/630) individuals bearing Y-HG Q. It was interesting to observe that 14/15 samples did not show any of the already known Y-HG Q sub-haplogroups (Q1, Q2, Q3 and Q4), defined by biallelic markers M120, M25/M143, M3 and M346, respectively (Table 1). Only one individual was observed with the presence of M120 polymorphism, representing Q1 lineage.

Table 1 Indian populations screened for Y-HG Q and its distribution.

Further, a novel 4 bp del/ins polymorphism (rs41352448, details provided in the Additional file 1) at 72,314 position of human arylsulfatase D pseudogene (ARSDP gene), in 5/15 individuals (33.3% of the observed Y-HG Q in the study) indicated the presence of a novel sub-group Q5 of Y-HG Q. In order to establish the exclusiveness of this polymorphism to Y-HG Q, we screened some of our samples already categorized in other unrelated haplogroups like R1a1, R2, L, H1, J, C, etc. The presence of an ancestral allele (without insertion at 72,314 position of ARSDP gene) in these samples confirmed the restriction of this novel polymorphism within the Y-HG Q. We also screened the sample with a derived state at M120 marker in the present study (representing Q1), using ss4 bp marker and found an absence of the polymorphism. In order to assign an independent status to designated Q5 and to confirm the placement of M346 derived samples as Q4 [4], it was necessary to study the novel ss4 bp marker (Q5) in M346 derived samples (Table 1). Three samples provided on request were screened for the ss4 bp polymorphism. The absence of this polymorphism in these three samples not only confirmed the authenticity of Q4 lineage but also validated the independent status of Q5 observed by us (Figure 1). In addition to the Y-HG Q5 (5/15) and Q1 (1/15) samples, there were 9/15 (60%) individuals who did not show any of the known Q subgroup signature and were designated as Q*.

Figure 1
figure 1

Partial YCC tree redrawn with the addition of M346 marker defining Y-HG Q4 and ss4 bp marker defining Y-HG Q5.

In order to put together our observations along with those made in literature earlier, we pooled data of 1615 Y- chromosomes (630 present study and 985 from literature) (Table 1) for analyses, of which 21/1615 (1.3%) samples represented Q lineage in India. All of our 15 samples and 3 samples belonging to Y-HG Q4 [4] were genotyped for 12 Y-microsatellite markers. However, to keep uniformity in evaluation, data of 10 overlapping Y-STR markers from present study and from different population groups of central Asia and India was used to construct median joining network [5], although 12 Y-STR markers were analysed by us (Additional file 2). For most of the Indian Y-HG Q and its sub-lineages, three clusters of Y-STR haplotypes were observed (Figure 2). One cluster included all the three Q4 and one Q*, another with all the Q5 and the third with most of the Q* bearing individuals. It was interesting to find that most of the Indian Q (Q4, Q5 and Q*) associated Y-STR haplotypes were separated from the bulk of Central Asian Q* associated haplotypes. Further, the clustering of the Y-STR haplotypes reassured the findings by the bi-allelic markers. This could either be due to population differentiation or because of the presence of these clusters in the ancestral migration from Central Asia, not clear at the moment. The increased diversity within the Indian population clusters could be interpreted as an overall effect of geographical differentiation, population expansions and severe bottlenecks resulting in loss of many of the in-between haplotypes thus, reducing the reticulation and increasing the branch lengths. It could also be as a result of independent migrations and admixture.

Figure 2
figure 2

Median joining network-showing relationship of individuals bearing Y-HG Q and its subgroups in population groups of Central Asia and India.

The age estimations made using a small sample size need to be increased which is not feasible at the moment, keeping in mind a very low frequency of Y-HG Q in India. The age estimation for haplogroup Q in India was carried out on the bigger cluster bearing Q* and Q5 in the median joining network. The calculated age of 47,101.5 (34,210.5 – 75,581.4) Years at 95% CI appears to be an over estimate than the age of haplogroup Q (15,000–18,000 Years Before Present) in literature[2, 3, 6]. This probably has occurred due to the enhanced diversity, probably as an effect of population expansions and severe bottlenecks or might be due to later migrations and admixture. A further estimation of the age of Y-HG Q5 alone, using similar parameters, provided an age estimate of 14,492.7 (10,526.3 – 23,255.8) Years at 95% CI.

The compilation of distribution pattern of Y-HG Q in Indian population (Table 1) from the present study as well as from literature, points out that this HG is distributed widely, ranging from Indo-European castes and tribes to their Dravidian counterparts, despite its low frequency. These observations could be explained either on the basis of the ancestral relationship of different Indian population groups, irrespective of linguistic and social divisions or alternatively by some degree of recent gene flow between these groups, not clear at the moment.

Conclusion

Interestingly, apart from assigned Y-HG Q5 (33.3%, 5/15) samples and the only one individual representing Q1 lineage, there were Y-HG Q bearing (60%, 9/15) samples, representing none of the known Q subgroups and designated as Q*. It was quite interesting to observe Y-HG Q in Indian subcontinent though in low frequency but with ancestral Q* state and a novel sub-branch Q5 not reported elsewhere. A novel subgroup Q4 was identified recently which is also restricted to Indian subcontinent [4]. The most plausible explanation for these observations could be an ancestral migration of individuals bearing ancestral lineage Q* to Indian subcontinent with an autochthonous differentiation to Q4 and Q5 sublineages later on. However, other explanations of either the presence of both the sub haplogroups (Q4 and Q5) in ancestral migrants or recent migrations from central Asia, cannot be ruled out till the distribution and diversity of these subgroups is explored extensively in Central Asia and other regions. To conclude, this study presents a novel polymorphism, adding another sublineage Q5 in the already existing arrangement of Y-HG Q in literature.

Methods

Samples

We screened 630 Y-chromosomes belonging to various linguistic families, castes and tribes throughout the Indian region (Table 1) to know the status of Y-HG Q and its sublineages and compared it with the data available in literature for different regions of the world [24, 6, 7] and also for Indian population groups [2, 4, 8].

Markers and their analysis

The binary markers: M45, 92R7, P36, MEH2, M120, M3, M25 [1], M242 [2] and M346 [4] were used to dissect out known Y-HG Q upto its subgroups. Samples were amplified, checked in 2% agarose gel and then sequenced to detect polymorphisms (using ABI 3100 sequencer, USA). We used primer set [Forward: ttgtccagagaaacagccaat and Reverse: atccatctacctacatacctgtcatc] to define the novel 4 bp (GGAT) deletion/insertion polymorphism 'ss4 bp' [details submitted in the dbSNP (BUILD 127); NCBI Assay ID: ss65713825; refSNP ID: rs41352448 and amplicon sequence provided as Additional file 1] at 72,314 position of human arylsulfatase D pseudogene (ARSDP gene). The total PCR reaction mix made was 12.5 μl, containing 50 ng of template DNA, 6.25 pmoles of each primer, 200 μM of dNTPs, 1.5 mM MgCl2, 1× reaction buffer and 0.3 units of Taq pol enzyme (Bangalore Genei, India). The cycling conditions were: denaturation at 94°C for 1 min, followed by annealing at 56°C for 1 min, and then extension at 72°C for 1 min, repeated for 30 cycles followed by a final extension step at 72°C for 5 min. PCR products were initially checked in 2% agarose gel, sequenced (using ABI Prism 3100-Avant Genetic Analyzer, Applied Biosystems, USA) and analyzed in SeqScape software v2.1.1 (Applied Biosystems, USA). These samples were also genotyped for 12 Y-microsatellite markers (Y-STRs): DYS388, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS394, DYS426, DYS437, DYS439, DYS448 described elsewhere[9, 10] to estimate haplotype variation within the HG defined by binary markers (Additional file 2).

Statistical and Phylogenetic analysis

Although 12 Y-STR markers were analysed by us (Additional file 2), however, to keep uniformity in evaluation, data of 10 overlapping Y-STR markers from present study and from different population groups of central Asia and India bearing Y-HG Q was used to construct median joining network[5]. Age estimation[11] was carried out using averaged variance (calculated by MICROSAT software, version 1.5d) of the Y-STRs and mutation rate[12] (μ) = 6.9 × 10-4 and 95% CI [9.5 × 10-4 - 4.3 × 10-4] per generation (g = 25 years).