Abstract
This study was designed to address issues regarding sample size and marker location that have arisen from the discovery of SNPs in the genomes of poorly characterized primate species and the application of these markers to the study of primate population genetics. We predict the effect of discovery sample size on the probability of discovering both rare and common SNPs and then compare this prediction with the proportion of common and rare SNPs discovered when different numbers of individuals are sequenced. Second, we examine the effect of genomic region on estimates of common population genetic data, comparing markers from both coding and non-coding regions of the rhesus macaque genome and the population genetic data calculated from these markers, to measure the degree and direction of bias introduced by SNPs located in coding versus non-coding regions of the genome. We found that both discovery sample size and genomic region surveyed affect SNP marker attributes and population genetic estimates, even when these are calculated from an expanded data set containing more individuals than the original discovery data set. Although none of the SNP detection methods or genomic regions tested in this study was completely uninformative, these results show that each has a different kind of genetic variation that is suitable for different purposes, and each introduces specific types of bias. Given that each SNP marker has an individual evolutionary history, we calculated that the most complete and unbiased representation of the genetic diversity present in the individual can be obtained by incorporating at least 10 individuals into the discovery sample set, to ensure the discovery of both common and rare polymorphisms.
Similar content being viewed by others
References
Aitken N, Smith S, Schwarz C, Morin PA (2004) Single nucleotide polymorphism (SNP) discovery in mammals: a targeted-gene approach. Mol Ecol 13:1423–1431
Akey JM, Zhang K, Xiong M, Jin L (2003) The effect of single nucleotide polymorphism identification strategies on estimates of linkage disequilibrium. Mol Biol Evol 20:232–242
Clark AG, Hubisz MJ, Bustamante CD, Williamson SH, Neilsen R (2005) Ascertainment bias in studies of human-genome wide polymorphism. Genome Res 15:1496–1502
Ferguson B, Street SL, Wright H, Pearson C, Jia Y, Thompson SL, Allibone P, Dubay CJ, Spindel E, Norgren RB (2007) Single nucleotide polymorphisms (SNPs) distinguish Indian-origin and Chinese-origin rhesus macaques (Macaca mulatta). BMC Genom 8:43
Hernandez RD, Hubisz MJ, Wheeler D, Smith DG, Ferguson B, Rogers J, Nazareth L, Indap A, Bourquin T, McPherson J, Muzny D, Gibbs R, Nielsen R, Bustamante CD (2007) Demographic histories and patterns of linkage disequilibrium for Chinese and Indian rhesus macaques. Science 316:240–243
Hoffman JI, Amos W (2005) Microsatellite genotyping errors: detection approaches, common sources and consequences for paternal exclusion. Mol Ecol 14:599–612
Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA, Degnan JH, Wang K, Guerreiro R, Bras JM, Schymick JC, Hernandez DG, Traynor BJ, Simon-Sanchez J, Matarin M, Britton A, van de Leemput J, Rafferty I, Bucan M, Cann HM, Hardy JA, Rosenberg NA, Singleton AB (2008) Genotype, haplotype, and copy-number variation in worldwide human populations. Nature 451:998–1003
Jombart T (2008) adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 24:1403–1405
Kanthaswamy S, Gill L, Satkoski J, Goyal V, Malladi V, Kou A, Basuta K, Sarkisyan L, George D, Smith DG (2009) The development of a Chinese–Indian hybrid (Chindian) rhesus macaque colony at the California National Primate Research Center (CNPRC) by introgression. J Med Primatol 38:86–96
Kanthaswamy S, Satkoski J, Kou A, Malladi V, Smith DG (2010) Detecting signatures of inter-regional and inter-specific hybridization among the Chinese rhesus macaque specific pathogen-free (SPF) population using single nucleotide polymorphic (SNP) markers. J Med Primatol 39:252–265
Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM (2008) Worldwide human relationships inferred from genome-wide patterns of variation. Science 319:1100–1104
Malhi RS, Sickler B, Lin D, Satkoski J, Tito RY, George D, Kanthaswamy S, Smith DG (2007) MamuSNP: a SNP resource for rhesus macaques (Macaca mulatta). PLOs ONE 2:e438
Morin PA, Smith DG, Kanthaswamy S (1997) Simple sequence repeat (SSR) polymorphisms for colony management and population genetics in rhesus macaques (Macaca mulatta). Am J Primatol 44:199–213
Morin PA, Luikart G, Wayne RK et al (2004) SNPs in ecology, evolution and conservation. Trends Ecol Evol 19:208–216
Nielsen R, Hubisz MJ, Clark AG (2004) Reconstituting the frequency spectrum of ascertained single-nucleotide polymorphism data. Genetics 168:2373–2382
Penedo MCT, Bontrop RE, Heijmans CMC, Otting N, Noort R, Rouweler AJM, de Groot N, de Groot NG, Ward T, Doxiadis GGM (2003) Microsatellite typing of the rhesus macaque MHC region. Immunogenetics 55:198–209
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC (2007) PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet 81:559–575
Raymond M, Rousset F (1995) GENEPOP (version 1.2): population genetics software for exact tests and ecumenicism. J Hered 86:248–249
Rhesus Macaque Genome Sequencing and Analysis Consortium (2007) Evolutionary and biomedical insights from the rhesus macaque genome. Science 316:222–234
Rogers J, Bergstrom M, Garcia R, Kaplan J, Arya A, Novakowski L, Johnson Z, Vinson A, Shelledy W (2005) A panel of 20 highly variable microsatellite polymorphisms in rhesus macaques (Macaca mulatta) selected for pedigree or population genetic analysis. Am J Primatol 67:377–383
Rogers J, Garcia R, Shelledy W, Kaplan J, Arya A, Johnson Z, Bergstrom M, Novakowski L, Nair P, Vinson A, Newman D, Heckman G, Cameron J (2006) An initial genetic linkage map of the rhesus macaque (Macaca mulatta) genome using human microsatellite loci. Genomics 87:30–38
Satkoski JA, George D, Smith DG, Kanthaswamy S (2008a) Genetic characterization of wild and captive rhesus macaques in China. J Med Primatol 37:67–80
Satkoski JA, Malhi RS, Kanthaswamy S, Tito RY, Malladi VS, Smith DG (2008b) Pyrosequencing as a method for SNP identification in the rhesus macaque (Macaca mulatta). BMC Genom 9:256
Siepel A (2009) Phylogenomics of primates and their ancestral populations. Genome Res 19:1929–1941
Smith DG, McDonough J (2005) Mitochondrial DNA variation in Chinese and Indian rhesus macaques (Macaca mulatta). Am J Primatol 65:1–25
Smith DG, George D, Kanthaswamy S, McDonough J (2006) Identification of country of origin and admixture between Indian and Chinese rhesus macaques. Int J Primatol 27:881–898
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA (2001) The sequence of the human genome. Science 291:1304–1351
Wakeley J, Nielsen R, Liu-Cordero SN, Ardlie K (2001) The discovery of single-nucleotide polymorphisms––and inferences about human demographic history. Am J Hum Genet 69:1332–1347
Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358–1370
Williams-Blangero S (1993) Research-oriented genetic management of nonhuman primate colonies. Lab Anim Sci 43:535–540
Acknowledgments
The authors would like to acknowledge the laboratory assistance provided by Debra George and Joy Erickson. This work was funded by a grant from the National Institutes of Health, no. RR05090, awarded to DGS.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Trask, J.A.S., Malhi, R.S., Kanthaswamy, S. et al. The effect of SNP discovery method and sample size on estimation of population genetic data for Chinese and Indian rhesus macaques (Macaca mulatta). Primates 52, 129–138 (2011). https://doi.org/10.1007/s10329-010-0232-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10329-010-0232-4