SNPs for a universal individual identification panel
An efficient method to uniquely identify every individual would have value in quality control and sample tracking of large collections of cell lines or DNA as is now often the case with whole genome association studies. Such a method would also be useful in forensics. SNPs represent the best markers for such purposes. We have developed a globally applicable resource of 92 SNPs for individual identification (IISNPs) with extremely low probabilities of any two unrelated individuals from anywhere in the world having identical genotypes. The SNPs were identified by screening over 500 likely/candidate SNPs on samples of 44 populations representing the major regions of the world. All 92 IISNPs have an average heterozygosity >0.4 and the F st values are all <0.06 on our 44 populations making these a universally applicable panel irrespective of ethnicity or ancestry. No significant linkage disequilibrium (LD) occurs for all unique pairings of 86 of the 92 IISNPs (median LD = 0.011) in all of the 44 populations. The remaining 6 IISNPs show strong LD in most of the 44 populations for a small subset (7) of the unique pairings in which they occur due to close linkage. 45 of the 86 SNPs are spread across the 22 human autosomes and show very loose or no genetic linkage with each other. These 45 IISNPs constitute an excellent panel for individual identification including paternity testing with associated probabilities of individual genotypes less than 10−15, smaller than achieved with the current panels of forensic markers. This panel also improves on an interim panel of 40 IISNPs previously identified using 40 population samples. The unlinked status of the subset of 45 SNPs we have identified also makes them useful for situations involving close biological relationships. Comparisons with random sets of SNPs illustrate the greater discriminating power, efficiency, and more universal applicability of this IISNP panel to populations around the world. The full set of 86 IISNPs that do not show LD can be used to provide even smaller genotype match probabilities in the range of 10−31–10−35 based on the 44 population samples studied.
KeywordsLinkage Disequilibrium Match Probability Individual Identification Average Heterozygosity Pairwise Linkage Disequilibrium
This work was funded primarily by NIJ Grants 2004-DN-BX-K025 and 2007-DN-BX-K197 to KKK awarded by the National Institute of Justice, Office of Justice Programs, US Department of Justice. Points of view in this document are those of the authors and do not necessarily represent the official position or policies of the US Department of Justice. We thank Applied Biosystems for making their allele frequency database available to us and for supplying some of the TaqMan reagents that were employed in these studies. We also thank Eva Straka for excellent technical help. We also want to acknowledge and thank the following people who helped assemble the population samples from the diverse populations over a period of many years: C. Barta, F. L. Black, B. Bonne-Tamir, L. L. Cavalli-Sforza, K. Dumars, J. Friedlaender, L. Giuffra, E. L. Grigorenko, S. L. B. Kajuna, N. J. Karoma, K. Kendler, J.-J. Kim, W. Knowler, S. Kungulilo, H. Li, R.-B. Lu, A. Odunsi, F. Okonofua, F. Oronsaye, J. Parnas, L. Peltonen, H. Rajeevan, L. O. Schulz, D. Upson, K. Weiss, and O. V. Zhukova. In addition, some of the cell lines were obtained from the National Laboratory for the Genetics of Israeli Populations at Tel Aviv University, Israel, and the African American samples were obtained from the Coriell Institute for Medical Research, Camden, NJ. Special thanks are due to the many hundreds of individuals who volunteered to give blood samples for studies of gene frequency variation.
- Budowle B, Moretti TR, Niezgoda SJ, Brown BL (1998) CODIS and PCR-based short tandem repeat loci: law enforcement tools. In: Second European symposium on human identification, Promega Corporation, MadisonGoogle Scholar
- Butler JM, Budowle B, Gill P, Kidd KK, Phillips C, Schneider PM, Vallone PM, Morling N (2008) Report on ISFG SNP Panel Discussion. In: Progress in forensic genetics: genetics supplement series, vol 1, pp 471–472Google Scholar
- Fang R., Pakstis AJ, Hyland F, Wang D, Shewale J, Kidd JR, Kidd KK, Furtado MR (2009) Multiplexed SNP detection panels for human identification. Forensic Sci Int Gene Suppl (in press). doi: 10.1016/j.fsigss.2009.08.161
- Pakstis AJ, Speed WC, Kidd JR, Kidd KK (2008) SNPs for individual identification. In: Progress in forensic genetics: genetics supplement series, vol 1, pp 479–481Google Scholar
- Phillips C, Prieto L, Fondevila M, Salas A, Gomez-Tato A, Alvarez-Deos J, Alonso A, Bianco-Verea A, Brion M, Montesino M, Carracedo A, Lareu MV (2009) Ancestry analysis in the 11-M Madrid bomb attack investigation. PLOS ONE 4:e6583Google Scholar
- Sanchez JJ, Phillips C, Borsting C, Balogh K, Bogus M, Fondevila M, Harrison CD, Musgrave-Brown E, Salas A, Syndercombe-Court D, Schneider PM, Carracedo A, Morling N (2006) A multiplex assay with 52 single nucleotide polymorphisms for human identification. Electrophoresis 27:1713–1724CrossRefPubMedGoogle Scholar
- Shriver MD, Mei R, Parra EJ, Sonpar V, Halder I, Tishkoff SA, Schurr TG, Zhadanov SI, Osipova LP, Brutsaert TD, Friedlaender J, Jorde LB, Watkins WS, Bamshad MJ, Guiterrez G, Loi H, Matsuzaki H, Kittles RA, Argyropoulos G, Fernandez JR, Akey JM, Jones KW (2005) Large-scale SNP analysis reveals clustered and continuous patterns of human genetic variation. Hum Genomics 2:81–89PubMedGoogle Scholar