Background

DNA evidence has turned into an influential tool in forensic sciences for resolving cases in relating a suspect to a scene of crime, determining issues regarding biological relationships, and recognizing victims of mass disasters. The development of DNA expertise has supplemented diverse areas, utilization of DNA evidence through Y-SNPs, Y chromosome—short tandem repeats (Y-STRs), and mitochondrial DNA (mtDNA), bringing in immense possibilities in assisting the criminal justice system. Due to its uniqueness among the other human chromosomes, Y chromosome haplogroups or haplotypes have been used for the identification of criminals in forensic cases (Jobling et al. 1997), paternal lineages in human evolution (Jobling and Tyler-Smith 1995), diseases in medical genetics (Jobling and Tyler-Smith 2000), and pedigrees in genealogical reconstructions (Jobling 2001). Although forensic genetics covers a broad range of disciplines, such as forensic pathology (Alacs et al. 2010), complex traits (Kayser and Schneider 2009; Pulker et al. 2007), and wild life forensics (Budowle et al. 2005), nowadays, in the field of forensic genetics, short tandem repeats (STRs)-centered DNA testing (Edwards et al. 1992) has been accepted as a principal approach used in cases of naïve paternity investigations (ZupanicPajnic et al. 2001), identification of skeletal remains (ZupanicPajnic et al. 2010), and complex criminal cases, involving rape and gang rape. STRs occupy nearly 3% of the total human genome and are present once in every 10,000 nucleotides on an average (Butler 2005). Multiplexing facilitates the use of these markers in forensic anthropology and medicolegal studies. At present, a number of laboratories conduct STR analysis while studying population genetics and report them in various ethnic populations (Tandon et al. 2002; Sarkar and Kashyap 2002; Sahoo and Kashyap 2002; Gaikwad and Kashyap 2002; Rajkumar and Kashyap 2002; Narkuti et al. 2008; Dubey et al. 2008; Giroti and Talwar 2010; Ghosh et al. 2011; Chaudhari and Dahiya 2014; Shrivastava et al. 2015; Shrivastava et al. 2016; Jain et al. 2017; Imam et al. 2017). However, in spite of being the most consistent and frequently utilized genetic markers in forensics, STRs have some drawbacks, which undermine their efficacy. STRs deliver precise results on well-preserved bone and soft tissue samples. The size of amplification necessary for STR testing is too high (150–450 bp) to permit practical amplification of fragmented DNA templates.

Compared to a monotonous STR-centered DNA profiling, SNP markers provide a valuable and progressively additional important information. SNPs provide an infinite cradle of human genome diversity for testing (Cooper et al. 1985; Wang et al. 1998). SNP profiling as a tool for DNA detection presents some benefits over and above the usage of STR markers (Sinha et al. 2017).

Y chromosome phylogeny (phylogeography) studies can be done by using bi- or multi-allelic markers (Jobling and Tyler-Smith 2000; Y Chromosome Consortium 2002). The largest non-recombining region (NRY) of DNA and different stable markers in the human Y chromosome makes it a perfect marker for use in evolutionary studies. Due to their high geographic specificity, Y-SNP haplogroups can be used to understand admixture and stratification between populations (Jobling and Tyler-Smith 2003). Greater mutational stability and higher mutation rate of Y chromosome SNPs make it advantageous when typing with highly degraded DNA (Thomson et al. 2000; Sobrino et al. 2005; Chakraborty et al. 1999). The Y chromosome haplogroup O-M175 is an important marker for eastern and Southeastern Asia, as it covers the most ubiquitous Y chromosome lineage, covering about 75% of mainland China (Su et al. 1999) and 87% of Southeast Asia (Karafet et al. 2005; Li et al. 2008; Karafet et al. 2010; Delfin et al. 2011 and He et al. 2012). Y chromosome haplogroup O-M175 is present in 84.79% of the studied population and is significantly important, as it is the most ubiquitous Y lineage in mainland India, China, Malaysia, Indonesia, and Vietnam (Southeast Asian populations) (Karafet et al. 2008).

Many Indian studies have reported frequencies of Y chromosome haplogroups in varying ethnic and language speaking tribes and castes (Kumar et al. 2007; Sharma et al. 2012; Khurana et al. 2014; Singh et al. 2016). The findings in the abovementioned studies dissects the Y chromosomal haplogroup pool and are helpful in understanding the current genetic scenario of Indian populations. With the findings of the abovementioned studies in background, the present study was conducted on two important indigenous tribal populations of South India—Porja and Savara.

Porja population is mainly distributed near the hill slopes of Munchingputtu, Anantagiri, and Peddabayalu regions of Visakhapatnam, Andhra Pradesh (AP), India. They migrated from Odisha to the present habitat about 300 years back. Savara population can be seen in Lakaiguda, Mettiguda, Chintalaguda, and Manduguda regions of Srikakulam, AP, India. Savara language is included in the Kol Munda group of Austro-Asiatic language family.

Materials and methodology

Sampling area

To investigate the genetic architecture of southeast coastal Indian populations of Porja and Savara tribes hailing from Visakhapatnam and Srikakulam districts of Andhra Pradesh, 217 blood samples were collected from healthy and unrelated male individuals belonging to them. The habitat of studied population groups is shown in Fig. 1. The present study was carried under the national project “DNA Polymorphisms” of Anthropological Survey of India, Kolkata. The ethical committee of Anthropological Survey of India has approved the present study.

Fig. 1
figure 1

Map of India highlighting Andhra Pradesh (left); sample collection area (right)

Sample collection and DNA extraction

After obtaining individual informed consent from volunteer donors, 5 ml of blood sample was drawn by a trained medical practitioner of Primary Health Centre (PHC) of affiliated villages in EDTA-coated vacutainers and transported to the DNA Laboratory of Anthropological Survey of India, Southern Regional Centre at Mysore, Karnataka, India, for further extraction and analysis. The DNA extraction was done by phenol-chloroform method (Phenol-Chloroform Isoamyl Alcohol (PCI) DNA Extraction 1998) and quantified using UV-visible spectrophotometer (Perkin-Elmer) at A 260/280 nm.

PCR and sequencing

A set of 15 bi-allelic SNP markers was analyzed to identify the Y chromosome haplogroups using sets of primers as described elsewhere (Karafet et al. 2008). The polymerase chain reaction (PCR) cyclic conditions for specific primers were standardized in the DNA Lab, Anthropological Survey of India, Southern Regional Centre at Mysore, Karnataka, India. The initial denaturation was performed at 95 °C for 5 min, followed by denaturation at 94 °C for 1 min, at an annealing temperature for specific primers at 51–58 °C, extension at 72 °C for 2 min 30 s, and final extension at 72 °C for 7 min. The generated amplified products were directly sequenced using Big Dye™Terminator Cycle Sequencing kit in the ABI prism 3730 DNA Analyzer (Applied Bio-Systems, USA).

Statistical analysis

The sequences which were generated were aligned with the individual reference sequences using SeqScape software V2.5 (Applied Bio-Systems, USA). For assigning Y chromosome binary haplogroups, the revised Y chromosome phylogenetic tree was referred (Karafet et al. 2008).

Results and discussion

Among the 217 male individuals under study, we observed four different haplogroups (H1*, H1a*, O2a*, and R2) in the studied samples, out of which O2a* accounted for 84.79% (42.86% in Porja and 41.94% in Savara population) (Table 1) (Fig. 2). The remaining three haplogroups R2, H*, and H1a* revealed low frequencies in both the studied populations. Their frequencies ranged from 1.38 to 3.69%, which is consistent with previous studies of Southeast Asian populations (Karafet et al. 2005; Li et al. 2008; Karafet et al. 2010; Delfin et al. 2011; He et al. 2012 and Karafet et al. 2008). The detailed description of each haplogroup is presented below.

Table 1 Y haplogroup distribution of the studied populations (Porja and Savara)
Fig. 2
figure 2

Phylogenetic distribution of Y-SNPs in 217 South Indian individuals (present study). The observed frequencies are shown at the right side of each sub-haplogroup along with the 16 different tribal populations from the adjoining neighbor states

Haplogroup O

Haplogroup O identified by M175 (5-bp deletion) was found with highest frequency of 84.79%. It possibly originated in East Asia (Karafet et al. 2008) and then migrated to South Asia Pacific. Paternal signature of haplogroup O can be traced at moderate or low frequencies in some parts of Central Asia and Oceania (Cai et al. 2011; Karafet et al. 2001; Underhill et al. 2001 and Deng et al. 2004). Haplogroup O is further divided into three sub-clades which are defined by the presence of O1-MSY 2.2, O2-P31, and O3-M122. Although the most frequent sub-clade observed in the present study was O2a*, it occurs with a frequency of 42.86% in Porja and 41.94% in Savara. O2a lineages are found in Southeast Asian populations of Malaysia, Vietnam, Indonesia, and Southern China (Sengupta et al. 2006).

Haplogroup R

Haplogroup R is characterized by M207 and is further segregated into two sub-clades R1, which is identified by M173 A>C allele, and R2, identified by M124 C>T allele. This haplogroup R1-M173 is estimated to have arisen during the Last Glacial Maximum (LGM) and is likely to be found in Southwestern Asia (Zhao et al. 2009), which is believed to have arisen 27,000 years ago in Asia. In the present study, haplogroup R2 lineage is present in 3.23% and 3.69% of Porja and Savara population respectively.

Haplogroup H

Haplogroup H is identified by M69 T>C allele. It is further divided into two sub-clades H1 which is identified by M52 A>C allele and H2 which is identified by APT G>A allele. Because of the high frequency of the H haplogroup in Indian tribal groups, it is often regarded as the original Indian haplogroup belonging to be the ancient settlers. We found the presence of H1* and H1a* in the present studied samples in low frequencies, about 4.61% and 3.69% respectively. Haplogroup H has also been reported from Central Asia, Western Asia, and Europe (Wells et al. 2001; Regueiro et al. 2006). The low frequency of H1* and H1a* in the present study is possibly due to their higher frequency among the Dravidian speaking tribes of South India and their presence in the remaining parts of the Indian subcontinent is limited (Thomson et al. 2000; Li et al. 2008; Zhao et al. 2009; Cordaux et al. 2004).

Phylogenetic analysis

Figure 3 presents the MDS plot based on Y (SNPs) haplogroup frequencies showing that Birja (AA), Juang (AA), Santhal (AA), and Ho (AA) tribes are more closely related to the present studied population groups, viz., Porja and Savara, showing closer genetic affinity between them. Among the two language groups, Austro-Asiatic and Dravidian, all the AA populations show critically closer genetic affinities with Porja and Savara. Yerukula and Chenchu as well as Naikpodgond form different clusters in the multi-dimensional scaling (MDS) (Fig. 1), which is probably due to the absence of O2a haplogroups (Table 1) which is specific to AA linguistic groups. In the same cluster except Nagesia population (DR), the rest of the DR cluster populations, viz., Oraon, Paharia, Naikpodgond, Chenchu, Yerukula, are distant from the presently studied populations probably because of the lack of frequency of O2a haplogroup. The Porja DR population show closer affinity with other AA populations possibly due to admixture or gene flow towards the Austro-Asiatic populations. In general, samples from the present study and the previous reference samples with the same linguistic or ethnic affiliations tended to be closer in the MDS plot, although there were a few variations.

Fig. 3
figure 3

Genetic relationships of populations based on Ø/ST distances estimated from Y chromosome haplogroup frequencies. AA Austro-Asiatic, DR Dravidian. Present studied samples (Savara and Porja) are compared with the data set of Vikrant et al. 2007

In the MDS plot (Fig. 4) based on the Y chromosomal (SNPs) frequencies of Porja and Savarafrom AP, Odisha fell indifferent clusters showing differences between the two populations from two different locations of India, probably due to genetic differences and low frequency of O2a haplogroup. Savara samples from AP (present study) are forming different cluster, whereas samples from Jharkhand, Odisha, and West Bengal are forming a different cluster due to the genetic differences among the populations and lesser frequency of O2 haplogroup in the samples from Odishha, West Bengal, and Jharkhand.

Fig. 4
figure 4

Genetic relationships of populations based on Ø/ST distances estimated from Y chromosome haplogroup frequencies. AP Andhra Pradesh, OR Orissa, WB West Bengal, JH Jharkhand. Present studied samples (Savara and Porja) from AP are compared with the data set of Vikrant et al. 2007

In the present study, we examined the genetic components of two populations from Southern India to identify their origin and the genetic similarity levels in the present day scenario. We further explored if any sub-populations, sub-lingual, socio-cultural affiliations, or gender-based demographic patterns influenced the genes or geneticity of these population groups. In this study, 217 individuals were typed for Y chromosome polymorphisms using a set of 15 bi-allelic markers on the non-recombining region of Y chromosome, which might be affected in an insignificant bias against some of the rare lineages. Despite the sample size limitation in the studied samples and reference samples from the border areas of Andhra Pradesh, the diversity in Y chromosome lineages suggested that the genetic pool of Andhra Pradesh especially tribal populace is composed of genes that have known phylogeographic origins in Europe and Southeast Asia, while their Y chromosomes show evidence of traces of the original inhabitants of the continent.

Conclusion

The distribution of Y-SNP haplogroups from the present studied populations will increase the resolution power of haplogroups and can play a crucial role in assigning geographical identity to these individual haplogroups, and make determination of the bio-geography of southeast coastal Indians an easy process. However, while applying the data of these haplogroups in Forensic cases, good number of populations should be analyzed with different markers and the individuality of geographic landmarks should be compared with the different haplogroups’ distribution. These Y chromosomal SNP haplotypes show characters similar to some of the mainland haplogroups. The data generated above can be used to find the patrilineal roots of the tested haplogroups. This information about Y chromosomal haplogroups and haplotypes is restricted to the tested population but can provide conclusive data for understanding the patrilineal bio-geography ancestry especially in disaster victim identifications (DVI). The different branches of Y chromosome tree have massive relationship with geographical areas which make it effectively capable of delivering the route map for the ancestors as Y-SNP markers depict association of ethnicity and geography with particular haplogroup frequencies. These Y-SNP markers have established their efficacy in identifying cases, but these uni-parental SNP markers are very less known for being helpful in identifying geographical ancestry although the SNP markers for DVI have now been utilized in a major case of disaster, i.e., the terrorist attack of 11 September 2001 on the World Trade Center (WTC) at New York City because STRs were too long for heavily degraded sample analysis. Finally, Y-SNPs, for their 100,000-times lower mutation rate in contrast to STRs, are superior for kinship testing and may replace STRs for such purposes once commercial kits become available.