Introduction

India presents with a pool of genetically and culturally (Cavalli-Sforza et al. 1994; Majumder 2001; Saha et al. 2003) diverse populations. Where on one hand, minuscule exercises of looking at caste- and region-related population structures have been continuing (Thangaraj et al. 1999; Bamshad et al. 2001; Kumar and Reddy 2003), the studies of the kind reported earlier and here will continue to add to the pool of information on relatedness of diverse population groups in India. This preliminary study was focused on providing a phylogenetic overview of five population groups belonging to Uttar Pradesh (UP), Bihar (BI), Punjab (PUNJ) and Bengal (WB) speaking Indo-Aryan dialects and from South India (SI) with Dravidian roots to study the relationship with each other and other world populations.

Materials and methods

Blood samples were collected (after seeking the required consent) in RBC lysis buffer from 144 unrelated healthy male donors from UP (n=49), WB (n=10), BI (n=41), PUNJ (n=22), and SI (n=22), and genomic DNA was isolated. Y-chromosomal biallelic markers M9 C > G (Underhill et al. 1997), 92R7 C > T (Mathias et al. 1994), and SRY 1532 A > G > A (Whitefield et al. 1995) typed by restriction digestion and confirmed by sequencing (using ABI 3100 DNA sequencer, USA) and YAP (Hammer 1994) were studied. An attempt was made to classify the results as per the Y-Consortium nomenclature (Y-Chromosome Consortium 2002). Four Y-STRs—DYS390, DYS391, DYS19 (Kayser et al. 1997) and DYS392 (with designed primer pair: F-5′ ata ctt aga ccc agt tga 3′ and R- 5′ atg ttc atc cat att ttc 3′)—were typed in 122 Y-chromosomes.

Frequencies of different alleles of Y-STR markers were obtained by the simple gene-counting method. Gene diversities (D) were estimated by employing the formula: (1−Σ p2) n/n−1 (Parra et al. 1999). For microsatellite haplotypes, genetic distances (matrix of Slatkin linearized FSTs) were obtained by ARLEQUIN 2.000 software (Schneider et al. 2000) for world population (from different published sources) (Gresham et al. 2001; Kayser et al. 2001; Qamar et al. 2002) and our data. The matrix of Slatkin linearized FSTs was used to construct a neighbor-joining tree by PHYLIP (Version 3.5c) (Felsenstein 1993; PHYLIP home page) and Tree View (Version 1.6.1) (TreeView Web site). Median-joining network was constructed by NETWORK (2.1 and 4.0) by giving each Y-STR locus a weight according to its estimated mutation rate (Kayser et al. 2003).

Results and discussion

The gene diversities for four binary markers (M9, 92R7, SRY1532, and YAP) and four Y-STRs (DYS390, DYS391, DYS392, and DYS19) were: UP (0.73 and 0.83), WB (0.78 and 0.76), PUNJ (0.72 and 0.69), BI (0.74 and 0.80), and SI (0.77 and 0.80) with an overall average of 0.73 and 0.78 in the studied Indian population groups. An absence of haplogroup (D and E) 4+ and (A) 7 was observed in all population groups studied. A total of 114 haplotypes were detected in the sample size of 122 involving all the studied population groups. The presence of different haplogroups in good frequencies (Fig. 1) indicated no single origin but a result of conglomeration of different lineages from time to time. This is supported by prehistoric, historic, and linguistic evidences where Middle East/West Asian and Central Asian gene pools have been known to have contributed to the Indian gene pool (Majumder 2001). The calculated diversity for both biallelic markers and Y-STRs strengthened the inference of a strong admixture in Indian populations reported earlier (Saha et al. 2003; Saha and Bamezai 2000) and supported by median joining network (Fig. 2). The presence of haplogroup (K) 26+ in low frequency in Punjab (0.05) reported earlier (Kivisild et al. 2003) and the presence of a high frequency (up to 0.62) of Eurasian lineages P (haplogroup 1+) and R1a (haplogroup 3) in northern populations hinted at the major influence of a western Eurasian genetic component as an effect of later migrations. It was apparent in this study that lineages F, C (haplogroup 2+), and K (haplogroup26+) contributed the most and could be the founder lineages to India, showing a high frequency (up to 0.59) in all studied population groups. This indicates that India acted as an incubator of early genetic differentiation of modern humans moving out of Africa to eastern parts (Cann 2001) and that their Y-chromosomes were largely replaced by subsequent migrations or gene flow through demic diffusion (Quintana-Murci et al. 2001) in Neolithic times, supporting the dispersal of Dravidian languages from the Elam province of Iran (Renfrew 1996). Thus, the presence of haplogroup 3 in high frequency and high diversity in the south Indian population (Kivisild et al. 2003) invalidates the concept of its Aryan origin and hints towards deep prehistoric differentiation supported by Y-STR-based subclustering in the median joining network (Fig. 2).

Fig. 1
figure 1

Distribution of Y-haplogroups in the studied Indian population groups

Fig. 2
figure 2

Median joining network of Y-haplogroups [(P) 1+, (C & F) 2+, (R1a) 3 and (K) 26′)]. Circles represent haplotypes and have an area proportional to frequency

The sharing of a common cluster of Indian populations along with a West Samoan population of Southeast Asia (Fig. 3) in the unrooted neighbor-joining tree of 38 world populations indicated that all the studied population groups have greater genetic affinity among them than with other world populations.

Fig. 3
figure 3

An unrooted neighbor-joining tree based on Slatkin’s linearized FST values showing the relationships between the different Indian and other world population groups