Single nucleotide polymorphism (SNP) profiling is an effective means of individual identification and ancestry inferences in forensic genetics. This study established a SNP panel for the simultaneous individual identification and ancestry assignment of Caucasian and four East and Southeast Asian populations. We analyzed 220 SNPs (125 autosomal, 17 X-chromosomal, 30 Y-chromosomal, and 48 mitochondrial SNPs) of the DNA samples from 563 unrelated individuals of five populations (89 Caucasian, 234 Taiwanese Han, 90 Filipino, 79 Indonesian and 71 Vietnamese) and 18 degraded DNA samples. Informativeness for assignment (In) was used to select ancestry informative SNPs (AISNPs). A machine learning classifier, support vector machine (SVM), was used for ancestry assignment. Of the 220 SNPs, 62 were individual identification SNPs (IISNPs) (51 autosomal and 11 X-chromosomal SNPs) and 191 were AISNPs (100 autosomal, 13 X-chromosomal, 30 Y-chromosomal, and 48 mitochondrial SNPs). The 51 autosomal IISNPs offered cumulative random match probabilities (cRMPs) ranging from 1.56 × 10−21 to 3.16 × 10−22 among these five populations. Using AISNPs with the SVM, the overall accuracy rate of ancestry inference achieved in the testing dataset between Caucasian, Taiwanese Han, and Filipino populations was 88.9%, whereas it was 70.0% between Caucasians and each of the four East and Southeast Asian populations. For the 18 degraded DNA samples with incomplete profiling, the accuracy rate of ancestry assignment was 94.4%. We have developed a 220-SNP panel for simultaneous individual identification and ethnic origin differentiation between Caucasian and the four East and Southeast Asian populations. This SNP panel may assist with DNA analysis of forensic casework.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Kayser M, de Knijff P. Improving human forensics through advances in genetics, genomics and molecular biology. Nat Rev Genet. 2011;12:179–92.
Børsting C, Sanchez JJ, Hansen HE, Hansen AJ, Bruun HQ, Morling N. Performance of the SNPforID 52 SNP-plex assay in paternity testing. Forensic Sci Int Genet. 2008;2:292–300.
Spichenok O, Budimlija ZM, Mitchell AA, Jenny A, Kovacevic L, Marjanovic D, et al. Prediction of eye and skin color in diverse populations using seven SNPs. Forensic Sci Int Genet. 2011;5:472–8.
Bouakaze C, Keyser C, Crubézy E, Montagnon D, Ludes B. Pigment phenotype and biogeographical ancestry from ancient skeletal remains: inferences from multiplexed autosomal SNP analysis. Int J Legal Med. 2009;123:315–25.
Gill P. An assessment of the utility of single nucleotide polymorphisms (SNPs) for forensic purposes. Int J Legal Med. 2001;114:204–10.
Amorim A, Pereira L. Pros and cons in the use of SNPs in forensic kinship investigation: a comparative analysis with STRs. Forensic Sci Int. 2005;150:17–21.
Kidd KK, Kidd JR, Speed WC, Fang R, Furtado MR, Hyland FC, et al. Expanding data and resources for forensic use of SNPs in individual identification. Forensic Sci Int Genet. 2012;6:646–52.
Hwa HL, Wu LS, Lin CY, Huang TY, Yin HI, Tseng LH, et al. Genotyping of 75 SNPs using arrays for individual identification in five population groups. Int J Legal Med. 2016;130:81–9.
Pakstis AJ, Speed WC, Fang R, Hyland FC, Furtado MR, Kidd JR, et al. SNPs for a universal individual identification panel. Hum Genet. 2010;127:315–24.
Butler M, Forensic DNA. Typing: biology, technology, and genetics of STR markers. 2nd ed. London: Elsevier Academic Press; 2005.
Phillips C, Salas A, Sánchez JJ, Fondevila M, Gómez-Tato A, Alvarez-Dios J, et al. SNPforID Consortium. Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs. Forensic Sci Int Genet. 2007;1:273–80.
Kosoy R, Nassir R, Tian C, White PA, Butler LM, Silva G, et al. Ancestry informative marker sets for determining continental origin and admixture proportions in common populations in America. Hum Mutat. 2009;30:69–78.
Kidd KK, Speed WC, Pakstis AJ, Furtado MR, Fang R, Madbouly A, et al. Progress toward an efficient panel of SNPs for ancestry inference. Forensic Sci Int Genet. 2014;10:23–32.
Pakstis AJ, Haigh E, Cherni L, ElGaaied ABA, Barton A, Evsanaa B, et al. 52 additional reference population samples for the 55 AISNP panel. Forensic Sci Int Genet. 2015;19:269–71.
Phillips C, Freire Aradas A, Kriegel AK, Fondevila M, Bulbul O, Santos C, et al. Eurasiaplex: a forensic SNP assay for differentiating European and South Asian ancestries. Forensic Sci Int Genet. 2013;7:359–66.
Chaitanya L, Walsh S, Andersen JD, Ansell R, Ballantyne K, Ballard D, et al. Collaborative EDNAP exercise on the IrisPlex system for DNA-based prediction of human eye colour. Forensic Sci Int Genet. 2014;11:241–51.
Eduardoff M, Gross TE, Santos C, de la Puente M, Ballard D, Strobl C. Inter-laboratory evaluation of the EUROFORGEN Global ancestry-informative SNP panel by massively parallel sequencing using the Ion PGM™. Forensic Sci Int Genet. 2016;23:178–89.
Lee HY, Yoo JE, Park MJ, Chung U, Kim CY, Shin KJ. East Asian mtDNA haplogroup determination in Koreans: haplogroup-level coding region SNP analysis and subhaplogroup-level control region sequence analysis. Electrophoresis. 2006;27:4408–18.
Bandelt HJ, van Oven M, Salas A. Haplogrouping mitochondrial DNA sequences in legal medicine/forensic genetics. Int J Legal Med. 2012;126:901–16.
Triki-Fendri S, Sánchez-Diz P, Rey-González D, Ayadi I, Carracedo Á, Rebai A. Paternal lineages in Libya inferred from Y-chromosome haplogroups. Am J Phys Anthropol. 2015;157:242–51.
Chaitanya L, van Oven M, Weiler N, Harteveld J, Wirken L, Sijen T, et al. Developmental validation of mitochondrial DNA genotyping assays for adept matrilineal inference of biogeographic ancestry at a continental level. Forensic Sci Int Genet. 2014;11:39–51.
Resque R, Gusmão L, Geppert M, Roewer L, Palha T, Alvarez L, et al. Male lineages in Brazil: intercontinental admixture and stratification of the European background. PLoS One. 2016;11:e0152573.
Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL, Hammer MF. New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Res. 2008;18:830–8.
Muro T, Iida R, Fujihara J, Yasuda T, Watanabe Y, Imamura S, et al. Simultaneous determination of seven informative Y chromosome SNPs to differentiate East Asian, European, and African populations. Leg Med (Tokyo). 2011;13:134–41.
Zuccarelli G, Alechine E, Caputo M, Bobillo C, Corach D, Sala A. Rapid screening for Native American mitochondrial and Y-chromosome haplogroups detection in routine DNA analysis. Forensic Sci Int Genet. 2011;5:105–8.
Tomas C, Sanchez JJ, Barbaro A, Brandt-Casadevall C, Hernandez A, Ben-Dhiab M, et al. X-chromosome SNP analyses in 11 human Mediterranean populations show a high overall genetic homogeneity except in North-west Africans (Moroccans). BMC Evol Biol. 2008;8:75.
Hwa HL, Lin CP, Huang TY, Kuo PH, Hsieh WH, Lin CY, et al. A panel of 130 autosomal single-nucleotide polymorphisms for ancestry assignment in five Asian populations and Caucasian. Forensic Sci Med Pathol. 2017;13:177–87.
Wang LP. Support vector machines: theory and applications. Berlin: Springer; 2005.
DeCoste D. Training invariant support vector machines. Mach Learn. 2002;46:161–90.
Lee Y, Lin Y, Wahba G. Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J Am Stat Assoc. 2004;99:67–81.
Zhou N, Wang L. Effective selection of informative SNPs and classification on the HapMap genotype data. BMC Bioinformatics. 2007;8:484.
Ban HJ, Heo JY, Oh KS, Park KJ. Identification of type 2 diabetes-associated combination of SNPs using support vector machine. BMC Genet. 2010;11:26.
Yoon D, Kim YJ, Park T. Phenotype prediction from genome-wide association studies: application to smoking behaviors. BMC Syst Biol 6 Suppl 2012;2:S11.
Chen YC, Lee JCI, Lin CY, Ko TM, Huang YH, Yin HY, et al. The effectiveness of sequence variants of MTCOI and MTCYB besides entire D-loop for haplotyping in eight population groups living in Taiwan. Rom J Leg Med. 2013;21:125–36.
Paschou P, Lewis J, Javed A, Drineas P. Ancestry informative markers for fine-scale individual assignment to worldwide populations. J Med Genet. 2010;47:835–47.
Kavakiotis I, Triantafyllidis A, Ntelidou D, Alexandri P, Megens HJ, Crooijmans RP, et al. TRES: identification of discriminatory and informative SNPs from population genomic data. J Hered. 2015;106:672–6.
Rosenberg NA, Li LM, Ward R, Pritchard JK. Informativeness of genetic markers for inference of ancestry. Am J Hum Genet. 2003;73:1402–22.
Suarez-Alvarez MM, Pham DT, Prostov MY, Prostov YI. Statistical approach to normalization of feature vectors and clustering of mixed datasets. Proc R Soc A. 2012;468:2630–51.
Nei M. Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci. 1973;70:3321–3.
Reynolds J, Weir BS. Cockerham CC. Estimation of the coancestry coefficient: basis for a short-term genetic distance. Genetics. 1983;105:767–79.
Devlin B, Risch N. A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics. 1995;29:311–22.
Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–5.
Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet. 2004;74:106–20.
Sedgwick P. Multiple hypothesis testing and Bonferroni’s correction. BMJ. 2014;349:g6284.
Biffani S, Pausch H, Schwarzenbacher H, Biscarini F. The effect of mislabeled phenotypic status on the identification of mutation-carriers from SNP genotypes in dairy cattle. BMC Res Notes. 2017;10:230.
Huang S, Cai N, Pacheco PP, Narrandes S, Wang Y, Xu W. Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics Proteomics. 2018;15:41–51.
Tian C, Kosoy R, Lee A, Ransom M, Belmont JW, Gregersen PK, et al. Analysis of East Asia genetic substructure using genome-wide SNP arrays. PLoS One. 2008;3:e3862.
Pfaff CL, Barnholtz-Sloan J, Wagner JK, Long JC. Information on ancestry from genetic markers. Genet Epidemiol. 2004;26:305–15.
Yahya P, Sulong S, Harun A, Wan Isa H, Ab Rajab NS, Wangkumhang P, et al. Analysis of the genetic structure of the Malay population: ancestry-informative marker SNPs in the Malay of Peninsular Malaysia. Forensic Sci Int Genet. 2017;30:152–9.
Paschou P, Ziv E, Burchard EG, Choudhry S, Rodriguez-Cintron W, Mahoney MW, et al. PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genet. 2007;3:1672–86.
Li CX, Pakstis AJ, Jiang L, Wei YL, Sun QF, Wu H, et al. A panel of 74 AISNPs: improved ancestry inference within Eastern Asia. Forensic Sci Int Genet. 2016;23:101–10.
Nelis M, Esko T, Mägi R, Zimprich F, Zimprich A, Toncheva D, et al. Genetic structure of Europeans: a view from the north-east. PLoS One. 2009;4:e5472.
Tian C, Kosoy R, Nassir R, Lee A, Villoslada P, Klarskog L, et al. European population genetic substructure: further definition of ancestry informative markers for distinguishing among diverse European ethnic groups. Mol Med. 2009;15:371–83.
This work was supported by the Ministry of Science and Technology, Taiwan, R.O.C. [grant numbers NSC 100-2320-B-002-013-MY3]. The authors thank the National Center for Genome Medicine at Academia Sinica, Taiwan, for SNP genotyping technical support. This Center was supported by grants from the National Core Facility Program for Biotechnology of National Science Council, Taiwan, R.O.C. We also acknowledge Ms. Pi-Mei Hsu, Ms. Shwu-Fang Li for technical support on DNA extraction, and Ms. Ai-Jiun Jung for typewriting. Special thanks are given to the hundreds of individuals who volunteered to provide biological samples for allele frequency data studies.
This study was funded by the Ministry of Science and Technology, Taiwan, R.O.C. [grant number: NSC 100–2320-B-002-013-MY3].
Conflict of interest
The authors declare that they have no conflict of interest.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent was obtained from all individual participants included in the study.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
About this article
Cite this article
Hwa, HL., Wu, MY., Lin, CP. et al. A single nucleotide polymorphism panel for individual identification and ancestry assignment in Caucasians and four East and Southeast Asian populations using a machine learning classifier. Forensic Sci Med Pathol 15, 67–74 (2019). https://doi.org/10.1007/s12024-018-0071-y
- Ancestry assignment
- Individual identification
- Machine learning classifier
- Single nucleotide polymorphism
- Support vector machine