Forensic Science, Medicine and Pathology

, Volume 15, Issue 1, pp 67–74 | Cite as

A single nucleotide polymorphism panel for individual identification and ancestry assignment in Caucasians and four East and Southeast Asian populations using a machine learning classifier

  • Hsiao-Lin Hwa
  • Ming-Yih Wu
  • Chih-Peng Lin
  • Wei Hsin Hsieh
  • Hsiang-I Yin
  • Tsui-Ting Lee
  • James Chun-I LeeEmail author
Original Article


Single nucleotide polymorphism (SNP) profiling is an effective means of individual identification and ancestry inferences in forensic genetics. This study established a SNP panel for the simultaneous individual identification and ancestry assignment of Caucasian and four East and Southeast Asian populations. We analyzed 220 SNPs (125 autosomal, 17 X-chromosomal, 30 Y-chromosomal, and 48 mitochondrial SNPs) of the DNA samples from 563 unrelated individuals of five populations (89 Caucasian, 234 Taiwanese Han, 90 Filipino, 79 Indonesian and 71 Vietnamese) and 18 degraded DNA samples. Informativeness for assignment (In) was used to select ancestry informative SNPs (AISNPs). A machine learning classifier, support vector machine (SVM), was used for ancestry assignment. Of the 220 SNPs, 62 were individual identification SNPs (IISNPs) (51 autosomal and 11 X-chromosomal SNPs) and 191 were AISNPs (100 autosomal, 13 X-chromosomal, 30 Y-chromosomal, and 48 mitochondrial SNPs). The 51 autosomal IISNPs offered cumulative random match probabilities (cRMPs) ranging from 1.56 × 10−21 to 3.16 × 10−22 among these five populations. Using AISNPs with the SVM, the overall accuracy rate of ancestry inference achieved in the testing dataset between Caucasian, Taiwanese Han, and Filipino populations was 88.9%, whereas it was 70.0% between Caucasians and each of the four East and Southeast Asian populations. For the 18 degraded DNA samples with incomplete profiling, the accuracy rate of ancestry assignment was 94.4%. We have developed a 220-SNP panel for simultaneous individual identification and ethnic origin differentiation between Caucasian and the four East and Southeast Asian populations. This SNP panel may assist with DNA analysis of forensic casework.


Ancestry assignment Array Individual identification Machine learning classifier Single nucleotide polymorphism Support vector machine 



This work was supported by the Ministry of Science and Technology, Taiwan, R.O.C. [grant numbers NSC 100-2320-B-002-013-MY3]. The authors thank the National Center for Genome Medicine at Academia Sinica, Taiwan, for SNP genotyping technical support. This Center was supported by grants from the National Core Facility Program for Biotechnology of National Science Council, Taiwan, R.O.C. We also acknowledge Ms. Pi-Mei Hsu, Ms. Shwu-Fang Li for technical support on DNA extraction, and Ms. Ai-Jiun Jung for typewriting. Special thanks are given to the hundreds of individuals who volunteered to provide biological samples for allele frequency data studies.


This study was funded by the Ministry of Science and Technology, Taiwan, R.O.C. [grant number: NSC 100–2320-B-002-013-MY3].

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Supplementary material

12024_2018_71_MOESM1_ESM.pdf (1.1 mb)
ESM 1 (PDF 1081 kb)


  1. 1.
    Kayser M, de Knijff P. Improving human forensics through advances in genetics, genomics and molecular biology. Nat Rev Genet. 2011;12:179–92.CrossRefGoogle Scholar
  2. 2.
    Børsting C, Sanchez JJ, Hansen HE, Hansen AJ, Bruun HQ, Morling N. Performance of the SNPforID 52 SNP-plex assay in paternity testing. Forensic Sci Int Genet. 2008;2:292–300.CrossRefGoogle Scholar
  3. 3.
    Spichenok O, Budimlija ZM, Mitchell AA, Jenny A, Kovacevic L, Marjanovic D, et al. Prediction of eye and skin color in diverse populations using seven SNPs. Forensic Sci Int Genet. 2011;5:472–8.CrossRefGoogle Scholar
  4. 4.
    Bouakaze C, Keyser C, Crubézy E, Montagnon D, Ludes B. Pigment phenotype and biogeographical ancestry from ancient skeletal remains: inferences from multiplexed autosomal SNP analysis. Int J Legal Med. 2009;123:315–25.CrossRefGoogle Scholar
  5. 5.
    Gill P. An assessment of the utility of single nucleotide polymorphisms (SNPs) for forensic purposes. Int J Legal Med. 2001;114:204–10.CrossRefGoogle Scholar
  6. 6.
    Amorim A, Pereira L. Pros and cons in the use of SNPs in forensic kinship investigation: a comparative analysis with STRs. Forensic Sci Int. 2005;150:17–21.CrossRefGoogle Scholar
  7. 7.
    Kidd KK, Kidd JR, Speed WC, Fang R, Furtado MR, Hyland FC, et al. Expanding data and resources for forensic use of SNPs in individual identification. Forensic Sci Int Genet. 2012;6:646–52.CrossRefGoogle Scholar
  8. 8.
    Hwa HL, Wu LS, Lin CY, Huang TY, Yin HI, Tseng LH, et al. Genotyping of 75 SNPs using arrays for individual identification in five population groups. Int J Legal Med. 2016;130:81–9.CrossRefGoogle Scholar
  9. 9.
    Pakstis AJ, Speed WC, Fang R, Hyland FC, Furtado MR, Kidd JR, et al. SNPs for a universal individual identification panel. Hum Genet. 2010;127:315–24.CrossRefGoogle Scholar
  10. 10.
    Butler M, Forensic DNA. Typing: biology, technology, and genetics of STR markers. 2nd ed. London: Elsevier Academic Press; 2005.Google Scholar
  11. 11.
    Phillips C, Salas A, Sánchez JJ, Fondevila M, Gómez-Tato A, Alvarez-Dios J, et al. SNPforID Consortium. Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs. Forensic Sci Int Genet. 2007;1:273–80.CrossRefGoogle Scholar
  12. 12.
    Kosoy R, Nassir R, Tian C, White PA, Butler LM, Silva G, et al. Ancestry informative marker sets for determining continental origin and admixture proportions in common populations in America. Hum Mutat. 2009;30:69–78.CrossRefGoogle Scholar
  13. 13.
    Kidd KK, Speed WC, Pakstis AJ, Furtado MR, Fang R, Madbouly A, et al. Progress toward an efficient panel of SNPs for ancestry inference. Forensic Sci Int Genet. 2014;10:23–32.CrossRefGoogle Scholar
  14. 14.
    Pakstis AJ, Haigh E, Cherni L, ElGaaied ABA, Barton A, Evsanaa B, et al. 52 additional reference population samples for the 55 AISNP panel. Forensic Sci Int Genet. 2015;19:269–71.CrossRefGoogle Scholar
  15. 15.
    Phillips C, Freire Aradas A, Kriegel AK, Fondevila M, Bulbul O, Santos C, et al. Eurasiaplex: a forensic SNP assay for differentiating European and South Asian ancestries. Forensic Sci Int Genet. 2013;7:359–66.CrossRefGoogle Scholar
  16. 16.
    Chaitanya L, Walsh S, Andersen JD, Ansell R, Ballantyne K, Ballard D, et al. Collaborative EDNAP exercise on the IrisPlex system for DNA-based prediction of human eye colour. Forensic Sci Int Genet. 2014;11:241–51.CrossRefGoogle Scholar
  17. 17.
    Eduardoff M, Gross TE, Santos C, de la Puente M, Ballard D, Strobl C. Inter-laboratory evaluation of the EUROFORGEN Global ancestry-informative SNP panel by massively parallel sequencing using the Ion PGM™. Forensic Sci Int Genet. 2016;23:178–89.CrossRefGoogle Scholar
  18. 18.
    Lee HY, Yoo JE, Park MJ, Chung U, Kim CY, Shin KJ. East Asian mtDNA haplogroup determination in Koreans: haplogroup-level coding region SNP analysis and subhaplogroup-level control region sequence analysis. Electrophoresis. 2006;27:4408–18.CrossRefGoogle Scholar
  19. 19.
    Bandelt HJ, van Oven M, Salas A. Haplogrouping mitochondrial DNA sequences in legal medicine/forensic genetics. Int J Legal Med. 2012;126:901–16.CrossRefGoogle Scholar
  20. 20.
    Triki-Fendri S, Sánchez-Diz P, Rey-González D, Ayadi I, Carracedo Á, Rebai A. Paternal lineages in Libya inferred from Y-chromosome haplogroups. Am J Phys Anthropol. 2015;157:242–51.CrossRefGoogle Scholar
  21. 21.
    Chaitanya L, van Oven M, Weiler N, Harteveld J, Wirken L, Sijen T, et al. Developmental validation of mitochondrial DNA genotyping assays for adept matrilineal inference of biogeographic ancestry at a continental level. Forensic Sci Int Genet. 2014;11:39–51.CrossRefGoogle Scholar
  22. 22.
    Resque R, Gusmão L, Geppert M, Roewer L, Palha T, Alvarez L, et al. Male lineages in Brazil: intercontinental admixture and stratification of the European background. PLoS One. 2016;11:e0152573.CrossRefGoogle Scholar
  23. 23.
    Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL, Hammer MF. New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Res. 2008;18:830–8.CrossRefGoogle Scholar
  24. 24.
    Muro T, Iida R, Fujihara J, Yasuda T, Watanabe Y, Imamura S, et al. Simultaneous determination of seven informative Y chromosome SNPs to differentiate East Asian, European, and African populations. Leg Med (Tokyo). 2011;13:134–41.CrossRefGoogle Scholar
  25. 25.
    Zuccarelli G, Alechine E, Caputo M, Bobillo C, Corach D, Sala A. Rapid screening for Native American mitochondrial and Y-chromosome haplogroups detection in routine DNA analysis. Forensic Sci Int Genet. 2011;5:105–8.CrossRefGoogle Scholar
  26. 26.
    Tomas C, Sanchez JJ, Barbaro A, Brandt-Casadevall C, Hernandez A, Ben-Dhiab M, et al. X-chromosome SNP analyses in 11 human Mediterranean populations show a high overall genetic homogeneity except in North-west Africans (Moroccans). BMC Evol Biol. 2008;8:75.CrossRefGoogle Scholar
  27. 27.
    Hwa HL, Lin CP, Huang TY, Kuo PH, Hsieh WH, Lin CY, et al. A panel of 130 autosomal single-nucleotide polymorphisms for ancestry assignment in five Asian populations and Caucasian. Forensic Sci Med Pathol. 2017;13:177–87.CrossRefGoogle Scholar
  28. 28.
    Wang LP. Support vector machines: theory and applications. Berlin: Springer; 2005.CrossRefGoogle Scholar
  29. 29.
    DeCoste D. Training invariant support vector machines. Mach Learn. 2002;46:161–90.CrossRefGoogle Scholar
  30. 30.
    Lee Y, Lin Y, Wahba G. Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J Am Stat Assoc. 2004;99:67–81.CrossRefGoogle Scholar
  31. 31.
    Zhou N, Wang L. Effective selection of informative SNPs and classification on the HapMap genotype data. BMC Bioinformatics. 2007;8:484.CrossRefGoogle Scholar
  32. 32.
    Ban HJ, Heo JY, Oh KS, Park KJ. Identification of type 2 diabetes-associated combination of SNPs using support vector machine. BMC Genet. 2010;11:26.CrossRefGoogle Scholar
  33. 33.
    Yoon D, Kim YJ, Park T. Phenotype prediction from genome-wide association studies: application to smoking behaviors. BMC Syst Biol 6 Suppl 2012;2:S11.Google Scholar
  34. 34.
    Chen YC, Lee JCI, Lin CY, Ko TM, Huang YH, Yin HY, et al. The effectiveness of sequence variants of MTCOI and MTCYB besides entire D-loop for haplotyping in eight population groups living in Taiwan. Rom J Leg Med. 2013;21:125–36.CrossRefGoogle Scholar
  35. 35.
    Paschou P, Lewis J, Javed A, Drineas P. Ancestry informative markers for fine-scale individual assignment to worldwide populations. J Med Genet. 2010;47:835–47.CrossRefGoogle Scholar
  36. 36.
    Kavakiotis I, Triantafyllidis A, Ntelidou D, Alexandri P, Megens HJ, Crooijmans RP, et al. TRES: identification of discriminatory and informative SNPs from population genomic data. J Hered. 2015;106:672–6.CrossRefGoogle Scholar
  37. 37.
    Rosenberg NA, Li LM, Ward R, Pritchard JK. Informativeness of genetic markers for inference of ancestry. Am J Hum Genet. 2003;73:1402–22.CrossRefGoogle Scholar
  38. 38.
    Suarez-Alvarez MM, Pham DT, Prostov MY, Prostov YI. Statistical approach to normalization of feature vectors and clustering of mixed datasets. Proc R Soc A. 2012;468:2630–51.CrossRefGoogle Scholar
  39. 39.
    Nei M. Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci. 1973;70:3321–3.CrossRefGoogle Scholar
  40. 40.
    Reynolds J, Weir BS. Cockerham CC. Estimation of the coancestry coefficient: basis for a short-term genetic distance. Genetics. 1983;105:767–79.PubMedPubMedCentralGoogle Scholar
  41. 41.
    Devlin B, Risch N. A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics. 1995;29:311–22.CrossRefGoogle Scholar
  42. 42.
    Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–5.CrossRefGoogle Scholar
  43. 43.
    Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet. 2004;74:106–20.CrossRefGoogle Scholar
  44. 44.
    Sedgwick P. Multiple hypothesis testing and Bonferroni’s correction. BMJ. 2014;349:g6284.CrossRefGoogle Scholar
  45. 45.
    Biffani S, Pausch H, Schwarzenbacher H, Biscarini F. The effect of mislabeled phenotypic status on the identification of mutation-carriers from SNP genotypes in dairy cattle. BMC Res Notes. 2017;10:230.CrossRefGoogle Scholar
  46. 46.
    Huang S, Cai N, Pacheco PP, Narrandes S, Wang Y, Xu W. Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics Proteomics. 2018;15:41–51.PubMedGoogle Scholar
  47. 47.
    Tian C, Kosoy R, Lee A, Ransom M, Belmont JW, Gregersen PK, et al. Analysis of East Asia genetic substructure using genome-wide SNP arrays. PLoS One. 2008;3:e3862.CrossRefGoogle Scholar
  48. 48.
    Pfaff CL, Barnholtz-Sloan J, Wagner JK, Long JC. Information on ancestry from genetic markers. Genet Epidemiol. 2004;26:305–15.CrossRefGoogle Scholar
  49. 49.
    Yahya P, Sulong S, Harun A, Wan Isa H, Ab Rajab NS, Wangkumhang P, et al. Analysis of the genetic structure of the Malay population: ancestry-informative marker SNPs in the Malay of Peninsular Malaysia. Forensic Sci Int Genet. 2017;30:152–9.CrossRefGoogle Scholar
  50. 50.
    Paschou P, Ziv E, Burchard EG, Choudhry S, Rodriguez-Cintron W, Mahoney MW, et al. PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genet. 2007;3:1672–86.CrossRefGoogle Scholar
  51. 51.
    Li CX, Pakstis AJ, Jiang L, Wei YL, Sun QF, Wu H, et al. A panel of 74 AISNPs: improved ancestry inference within Eastern Asia. Forensic Sci Int Genet. 2016;23:101–10.CrossRefGoogle Scholar
  52. 52.
    Nelis M, Esko T, Mägi R, Zimprich F, Zimprich A, Toncheva D, et al. Genetic structure of Europeans: a view from the north-east. PLoS One. 2009;4:e5472.CrossRefGoogle Scholar
  53. 53.
    Tian C, Kosoy R, Nassir R, Lee A, Villoslada P, Klarskog L, et al. European population genetic substructure: further definition of ancestry informative markers for distinguishing among diverse European ethnic groups. Mol Med. 2009;15:371–83.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Hsiao-Lin Hwa
    • 1
    • 2
    • 3
  • Ming-Yih Wu
    • 2
  • Chih-Peng Lin
    • 4
  • Wei Hsin Hsieh
    • 4
  • Hsiang-I Yin
    • 1
  • Tsui-Ting Lee
    • 1
  • James Chun-I Lee
    • 1
    Email author
  1. 1.Department and Graduate Institute of Forensic Medicine, College of MedicineNational Taiwan UniversityTaipeiTaiwan
  2. 2.Department of Obstetrics and GynecologyNational Taiwan University HospitalTaipeiTaiwan
  3. 3.Department of Medical GeneticsNational Taiwan University HospitalTaipeiTaiwan
  4. 4.Yourgene BioscienceNew Taipei CityTaiwan

Personalised recommendations