A single nucleotide polymorphism panel for individual identification and ancestry assignment in Caucasians and four East and Southeast Asian populations using a machine learning classifier

Abstract

Single nucleotide polymorphism (SNP) profiling is an effective means of individual identification and ancestry inferences in forensic genetics. This study established a SNP panel for the simultaneous individual identification and ancestry assignment of Caucasian and four East and Southeast Asian populations. We analyzed 220 SNPs (125 autosomal, 17 X-chromosomal, 30 Y-chromosomal, and 48 mitochondrial SNPs) of the DNA samples from 563 unrelated individuals of five populations (89 Caucasian, 234 Taiwanese Han, 90 Filipino, 79 Indonesian and 71 Vietnamese) and 18 degraded DNA samples. Informativeness for assignment (In) was used to select ancestry informative SNPs (AISNPs). A machine learning classifier, support vector machine (SVM), was used for ancestry assignment. Of the 220 SNPs, 62 were individual identification SNPs (IISNPs) (51 autosomal and 11 X-chromosomal SNPs) and 191 were AISNPs (100 autosomal, 13 X-chromosomal, 30 Y-chromosomal, and 48 mitochondrial SNPs). The 51 autosomal IISNPs offered cumulative random match probabilities (cRMPs) ranging from 1.56 × 10−21 to 3.16 × 10−22 among these five populations. Using AISNPs with the SVM, the overall accuracy rate of ancestry inference achieved in the testing dataset between Caucasian, Taiwanese Han, and Filipino populations was 88.9%, whereas it was 70.0% between Caucasians and each of the four East and Southeast Asian populations. For the 18 degraded DNA samples with incomplete profiling, the accuracy rate of ancestry assignment was 94.4%. We have developed a 220-SNP panel for simultaneous individual identification and ethnic origin differentiation between Caucasian and the four East and Southeast Asian populations. This SNP panel may assist with DNA analysis of forensic casework.

This is a preview of subscription content, access via your institution.

Fig. 1

References

  1. 1.

    Kayser M, de Knijff P. Improving human forensics through advances in genetics, genomics and molecular biology. Nat Rev Genet. 2011;12:179–92.

    CAS  Article  Google Scholar 

  2. 2.

    Børsting C, Sanchez JJ, Hansen HE, Hansen AJ, Bruun HQ, Morling N. Performance of the SNPforID 52 SNP-plex assay in paternity testing. Forensic Sci Int Genet. 2008;2:292–300.

    Article  Google Scholar 

  3. 3.

    Spichenok O, Budimlija ZM, Mitchell AA, Jenny A, Kovacevic L, Marjanovic D, et al. Prediction of eye and skin color in diverse populations using seven SNPs. Forensic Sci Int Genet. 2011;5:472–8.

    CAS  Article  Google Scholar 

  4. 4.

    Bouakaze C, Keyser C, Crubézy E, Montagnon D, Ludes B. Pigment phenotype and biogeographical ancestry from ancient skeletal remains: inferences from multiplexed autosomal SNP analysis. Int J Legal Med. 2009;123:315–25.

    Article  Google Scholar 

  5. 5.

    Gill P. An assessment of the utility of single nucleotide polymorphisms (SNPs) for forensic purposes. Int J Legal Med. 2001;114:204–10.

    CAS  Article  Google Scholar 

  6. 6.

    Amorim A, Pereira L. Pros and cons in the use of SNPs in forensic kinship investigation: a comparative analysis with STRs. Forensic Sci Int. 2005;150:17–21.

    CAS  Article  Google Scholar 

  7. 7.

    Kidd KK, Kidd JR, Speed WC, Fang R, Furtado MR, Hyland FC, et al. Expanding data and resources for forensic use of SNPs in individual identification. Forensic Sci Int Genet. 2012;6:646–52.

    CAS  Article  Google Scholar 

  8. 8.

    Hwa HL, Wu LS, Lin CY, Huang TY, Yin HI, Tseng LH, et al. Genotyping of 75 SNPs using arrays for individual identification in five population groups. Int J Legal Med. 2016;130:81–9.

    Article  Google Scholar 

  9. 9.

    Pakstis AJ, Speed WC, Fang R, Hyland FC, Furtado MR, Kidd JR, et al. SNPs for a universal individual identification panel. Hum Genet. 2010;127:315–24.

    Article  Google Scholar 

  10. 10.

    Butler M, Forensic DNA. Typing: biology, technology, and genetics of STR markers. 2nd ed. London: Elsevier Academic Press; 2005.

    Google Scholar 

  11. 11.

    Phillips C, Salas A, Sánchez JJ, Fondevila M, Gómez-Tato A, Alvarez-Dios J, et al. SNPforID Consortium. Inferring ancestral origin using a single multiplex assay of ancestry-informative marker SNPs. Forensic Sci Int Genet. 2007;1:273–80.

    CAS  Article  Google Scholar 

  12. 12.

    Kosoy R, Nassir R, Tian C, White PA, Butler LM, Silva G, et al. Ancestry informative marker sets for determining continental origin and admixture proportions in common populations in America. Hum Mutat. 2009;30:69–78.

    Article  Google Scholar 

  13. 13.

    Kidd KK, Speed WC, Pakstis AJ, Furtado MR, Fang R, Madbouly A, et al. Progress toward an efficient panel of SNPs for ancestry inference. Forensic Sci Int Genet. 2014;10:23–32.

    CAS  Article  Google Scholar 

  14. 14.

    Pakstis AJ, Haigh E, Cherni L, ElGaaied ABA, Barton A, Evsanaa B, et al. 52 additional reference population samples for the 55 AISNP panel. Forensic Sci Int Genet. 2015;19:269–71.

    CAS  Article  Google Scholar 

  15. 15.

    Phillips C, Freire Aradas A, Kriegel AK, Fondevila M, Bulbul O, Santos C, et al. Eurasiaplex: a forensic SNP assay for differentiating European and South Asian ancestries. Forensic Sci Int Genet. 2013;7:359–66.

    CAS  Article  Google Scholar 

  16. 16.

    Chaitanya L, Walsh S, Andersen JD, Ansell R, Ballantyne K, Ballard D, et al. Collaborative EDNAP exercise on the IrisPlex system for DNA-based prediction of human eye colour. Forensic Sci Int Genet. 2014;11:241–51.

    CAS  Article  Google Scholar 

  17. 17.

    Eduardoff M, Gross TE, Santos C, de la Puente M, Ballard D, Strobl C. Inter-laboratory evaluation of the EUROFORGEN Global ancestry-informative SNP panel by massively parallel sequencing using the Ion PGM™. Forensic Sci Int Genet. 2016;23:178–89.

    CAS  Article  Google Scholar 

  18. 18.

    Lee HY, Yoo JE, Park MJ, Chung U, Kim CY, Shin KJ. East Asian mtDNA haplogroup determination in Koreans: haplogroup-level coding region SNP analysis and subhaplogroup-level control region sequence analysis. Electrophoresis. 2006;27:4408–18.

    CAS  Article  Google Scholar 

  19. 19.

    Bandelt HJ, van Oven M, Salas A. Haplogrouping mitochondrial DNA sequences in legal medicine/forensic genetics. Int J Legal Med. 2012;126:901–16.

    Article  Google Scholar 

  20. 20.

    Triki-Fendri S, Sánchez-Diz P, Rey-González D, Ayadi I, Carracedo Á, Rebai A. Paternal lineages in Libya inferred from Y-chromosome haplogroups. Am J Phys Anthropol. 2015;157:242–51.

    Article  Google Scholar 

  21. 21.

    Chaitanya L, van Oven M, Weiler N, Harteveld J, Wirken L, Sijen T, et al. Developmental validation of mitochondrial DNA genotyping assays for adept matrilineal inference of biogeographic ancestry at a continental level. Forensic Sci Int Genet. 2014;11:39–51.

    CAS  Article  Google Scholar 

  22. 22.

    Resque R, Gusmão L, Geppert M, Roewer L, Palha T, Alvarez L, et al. Male lineages in Brazil: intercontinental admixture and stratification of the European background. PLoS One. 2016;11:e0152573.

    Article  Google Scholar 

  23. 23.

    Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL, Hammer MF. New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Res. 2008;18:830–8.

    CAS  Article  Google Scholar 

  24. 24.

    Muro T, Iida R, Fujihara J, Yasuda T, Watanabe Y, Imamura S, et al. Simultaneous determination of seven informative Y chromosome SNPs to differentiate East Asian, European, and African populations. Leg Med (Tokyo). 2011;13:134–41.

    CAS  Article  Google Scholar 

  25. 25.

    Zuccarelli G, Alechine E, Caputo M, Bobillo C, Corach D, Sala A. Rapid screening for Native American mitochondrial and Y-chromosome haplogroups detection in routine DNA analysis. Forensic Sci Int Genet. 2011;5:105–8.

    CAS  Article  Google Scholar 

  26. 26.

    Tomas C, Sanchez JJ, Barbaro A, Brandt-Casadevall C, Hernandez A, Ben-Dhiab M, et al. X-chromosome SNP analyses in 11 human Mediterranean populations show a high overall genetic homogeneity except in North-west Africans (Moroccans). BMC Evol Biol. 2008;8:75.

    Article  Google Scholar 

  27. 27.

    Hwa HL, Lin CP, Huang TY, Kuo PH, Hsieh WH, Lin CY, et al. A panel of 130 autosomal single-nucleotide polymorphisms for ancestry assignment in five Asian populations and Caucasian. Forensic Sci Med Pathol. 2017;13:177–87.

    CAS  Article  Google Scholar 

  28. 28.

    Wang LP. Support vector machines: theory and applications. Berlin: Springer; 2005.

    Google Scholar 

  29. 29.

    DeCoste D. Training invariant support vector machines. Mach Learn. 2002;46:161–90.

    Article  Google Scholar 

  30. 30.

    Lee Y, Lin Y, Wahba G. Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J Am Stat Assoc. 2004;99:67–81.

    Article  Google Scholar 

  31. 31.

    Zhou N, Wang L. Effective selection of informative SNPs and classification on the HapMap genotype data. BMC Bioinformatics. 2007;8:484.

    Article  Google Scholar 

  32. 32.

    Ban HJ, Heo JY, Oh KS, Park KJ. Identification of type 2 diabetes-associated combination of SNPs using support vector machine. BMC Genet. 2010;11:26.

    Article  Google Scholar 

  33. 33.

    Yoon D, Kim YJ, Park T. Phenotype prediction from genome-wide association studies: application to smoking behaviors. BMC Syst Biol 6 Suppl 2012;2:S11.

  34. 34.

    Chen YC, Lee JCI, Lin CY, Ko TM, Huang YH, Yin HY, et al. The effectiveness of sequence variants of MTCOI and MTCYB besides entire D-loop for haplotyping in eight population groups living in Taiwan. Rom J Leg Med. 2013;21:125–36.

    CAS  Article  Google Scholar 

  35. 35.

    Paschou P, Lewis J, Javed A, Drineas P. Ancestry informative markers for fine-scale individual assignment to worldwide populations. J Med Genet. 2010;47:835–47.

    Article  Google Scholar 

  36. 36.

    Kavakiotis I, Triantafyllidis A, Ntelidou D, Alexandri P, Megens HJ, Crooijmans RP, et al. TRES: identification of discriminatory and informative SNPs from population genomic data. J Hered. 2015;106:672–6.

    Article  Google Scholar 

  37. 37.

    Rosenberg NA, Li LM, Ward R, Pritchard JK. Informativeness of genetic markers for inference of ancestry. Am J Hum Genet. 2003;73:1402–22.

    CAS  Article  Google Scholar 

  38. 38.

    Suarez-Alvarez MM, Pham DT, Prostov MY, Prostov YI. Statistical approach to normalization of feature vectors and clustering of mixed datasets. Proc R Soc A. 2012;468:2630–51.

    Article  Google Scholar 

  39. 39.

    Nei M. Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci. 1973;70:3321–3.

    CAS  Article  Google Scholar 

  40. 40.

    Reynolds J, Weir BS. Cockerham CC. Estimation of the coancestry coefficient: basis for a short-term genetic distance. Genetics. 1983;105:767–79.

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Devlin B, Risch N. A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics. 1995;29:311–22.

    CAS  Article  Google Scholar 

  42. 42.

    Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–5.

    CAS  Article  Google Scholar 

  43. 43.

    Carlson CS, Eberle MA, Rieder MJ, Yi Q, Kruglyak L, Nickerson DA. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet. 2004;74:106–20.

    CAS  Article  Google Scholar 

  44. 44.

    Sedgwick P. Multiple hypothesis testing and Bonferroni’s correction. BMJ. 2014;349:g6284.

    Article  Google Scholar 

  45. 45.

    Biffani S, Pausch H, Schwarzenbacher H, Biscarini F. The effect of mislabeled phenotypic status on the identification of mutation-carriers from SNP genotypes in dairy cattle. BMC Res Notes. 2017;10:230.

    Article  Google Scholar 

  46. 46.

    Huang S, Cai N, Pacheco PP, Narrandes S, Wang Y, Xu W. Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics Proteomics. 2018;15:41–51.

    CAS  PubMed  Google Scholar 

  47. 47.

    Tian C, Kosoy R, Lee A, Ransom M, Belmont JW, Gregersen PK, et al. Analysis of East Asia genetic substructure using genome-wide SNP arrays. PLoS One. 2008;3:e3862.

    Article  Google Scholar 

  48. 48.

    Pfaff CL, Barnholtz-Sloan J, Wagner JK, Long JC. Information on ancestry from genetic markers. Genet Epidemiol. 2004;26:305–15.

    Article  Google Scholar 

  49. 49.

    Yahya P, Sulong S, Harun A, Wan Isa H, Ab Rajab NS, Wangkumhang P, et al. Analysis of the genetic structure of the Malay population: ancestry-informative marker SNPs in the Malay of Peninsular Malaysia. Forensic Sci Int Genet. 2017;30:152–9.

    CAS  Article  Google Scholar 

  50. 50.

    Paschou P, Ziv E, Burchard EG, Choudhry S, Rodriguez-Cintron W, Mahoney MW, et al. PCA-correlated SNPs for structure identification in worldwide human populations. PLoS Genet. 2007;3:1672–86.

    CAS  Article  Google Scholar 

  51. 51.

    Li CX, Pakstis AJ, Jiang L, Wei YL, Sun QF, Wu H, et al. A panel of 74 AISNPs: improved ancestry inference within Eastern Asia. Forensic Sci Int Genet. 2016;23:101–10.

    CAS  Article  Google Scholar 

  52. 52.

    Nelis M, Esko T, Mägi R, Zimprich F, Zimprich A, Toncheva D, et al. Genetic structure of Europeans: a view from the north-east. PLoS One. 2009;4:e5472.

    Article  Google Scholar 

  53. 53.

    Tian C, Kosoy R, Nassir R, Lee A, Villoslada P, Klarskog L, et al. European population genetic substructure: further definition of ancestry informative markers for distinguishing among diverse European ethnic groups. Mol Med. 2009;15:371–83.

    CAS  Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Ministry of Science and Technology, Taiwan, R.O.C. [grant numbers NSC 100-2320-B-002-013-MY3]. The authors thank the National Center for Genome Medicine at Academia Sinica, Taiwan, for SNP genotyping technical support. This Center was supported by grants from the National Core Facility Program for Biotechnology of National Science Council, Taiwan, R.O.C. We also acknowledge Ms. Pi-Mei Hsu, Ms. Shwu-Fang Li for technical support on DNA extraction, and Ms. Ai-Jiun Jung for typewriting. Special thanks are given to the hundreds of individuals who volunteered to provide biological samples for allele frequency data studies.

Funding

This study was funded by the Ministry of Science and Technology, Taiwan, R.O.C. [grant number: NSC 100–2320-B-002-013-MY3].

Author information

Affiliations

Authors

Corresponding author

Correspondence to James Chun-I Lee.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

ESM 1

(PDF 1081 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hwa, HL., Wu, MY., Lin, CP. et al. A single nucleotide polymorphism panel for individual identification and ancestry assignment in Caucasians and four East and Southeast Asian populations using a machine learning classifier. Forensic Sci Med Pathol 15, 67–74 (2019). https://doi.org/10.1007/s12024-018-0071-y

Download citation

Keywords

  • Ancestry assignment
  • Array
  • Individual identification
  • Machine learning classifier
  • Single nucleotide polymorphism
  • Support vector machine