Abstract
Current methods for measures of genetic diversity of populations and germplasm collections are often based on statistics calculated from molecular markers. The objective of this study was to investigate the precision and accuracy of the most common estimators of genetic variability and population structure, as calculated from simple sequence repeat (SSR) marker data from cacao (Theobroma cacao L.). Computer simulated genomes of replicate populations were generated from initial allele frequencies estimated using SSR data from cacao accessions in a collection. The simulated genomes consisted of ten linkage groups of 100 cM in length each. Heterozygosity, gene diversity and the F statistics were studied as a function of number of loci and trees sampled. The results showed that relatively small random samples of trees were needed to achieve consistency in the observed estimations. In contrast, very large random samples of loci per linkage group were required to enable reliable inferences on the whole genome. Precision of estimates was increased by more than 50% with an increase in sample size from one to five loci per linkage group or 50 per genome, and up to 70% with ten loci per linkage group, or equivalently, 100 loci per genome. The use of fewer, highly polymorphic loci to analyze genetic variability led to estimates with substantially smaller variance but with an upward bias. Nevertheless, the relative differences of estimates among populations were generally consistent for the different levels of polymorphism considered.
Similar content being viewed by others
References
Ahmad R, Potter D, Southwick SM (2003) Genotyping of peach and nectarine cultivars with SSR and SRAP markers. J Am Soc Hortic Sci 128:898–903
Brown JS, Schnell RJ, Motamayor JC, Lopes U, Kuhn DN, Borrone JW (2005) Resistance gene mapping for Witches’ Broom disease in Theobroma cacao L. in a F2 population using SSR markers and candidate genes. J Amer Soc Hortic Sci 130:366–373
Clement D, Risterucci AM, Motamayor JC, N’Goran J, Lanaud C (2003) Mapping quantitative trait loci for bean traits and ovule number in Theobroma cacao L. Genome 46:103–111
Cockerham CC (1969) Variance of gene frequencies. Evolution 23:72–84
Couch J, Zintel HA, Fritz P (1993) The genome of the tropical tree Theobroma cacao L. Mol Gen Genet 237:123–128
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman and Hall, NY
Fuerst PA, Chakraborty R, Nei M (1977) Statistical studies on protein polymorphism in natural populations: I. Distribution of single locus heterozygosity. Genetics 86:455–483
Gao Z-H, Shen ZJ, Han Z-H, Fang JG, Zhang Y-M, Zhang Z (2004) Microsatellite markers and genetic diversity in Japanese apricot (Prunus mume). HortScience 39:1571–1574
Gilks WR, Richardson S, Spiegelhalter DJ (1996) Markov Chain Monte Carlo in practice. Chapman and Hall, London
Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325–338
Haldane JBS (1919) The combination of linkage values, and the calculation of distances between the loci of linked factors. J Genet 8:299–309
Haymes KM, Ibrahim IA, Mischke S, Saunders J (2004) Rapid isolation of DNA from chocolate and date palm tree crops. J Agric Food Chem 52:5456–5462
Kalinowski ST (2002a) How many alleles per locus should be used to estimate genetic distances? Heredity 88:62–65
Kalinowski ST (2002b) Evolutionary and statistical properties of three genetic distances. Mol Ecol 11:1263–1273
Kalinowski ST (2005) Do polymorphic loci require large sample sizes to estimate genetic distances? Heredity 94:33–36
Kimura M, Crow JF (1964) The number of alleles that can be maintained in a finite population. Genetics 49:725–738
Lanaud C, Hamon P, Duperray C (1992) Estimation of the nuclear DNA content of Theobroma cacao L. by flow cytometry. Café Cacao 36:3–8
Lanaud C, Risterucci AM, N’Goran AJK, Clement D, Flament MH, Laurent V, Falque M (1995) A genetic linkage map of Theobroma cacao L. Theor Appl Genet 91:987–993
Lanaud C, Risterucci AM, Pieretti I, Falque M, Bouet A, Lagoda PJL (1999) Isolation and characterization of microsatellites in Theobroma cacao L. Mol Ecol 8:2141–2152
Laurent V, Risterucci AM, Lanaud C (1994) Genetic diversity in cocoa revealed by cDNA probes. Theor Appl Genet 88:193–198
Lerceteau E, Robert T, Pétiard V, Crouzillat D (1997) Evaluation of the extent of genetic variability among Theobroma cacao using RAPD and RFLP markers. Theor Appl Genet 95:10–19
Littell RC, Milliken GA, Stroup WW, Wolfinger RD (1996) SAS System for mixed models. SAS Institute, Cary, NC
Liu K, Goodman M, Muse S, Smith JS, Buckler E, Doebley J (2003) Genetic structure and diversity among maize inbred lines as inferred from DNA microsatellites. Genetics 165:2117–2128
MacLeod AK, Haley CS, Woolliams JA, Stam P (2005) Marker densities and the mapping of ancestral junctions. Genet Res Camb 85:69–79
Meinguet J (1979) Multivariate interpolation at arbitrary points made simple. J Appl Math Phys 30:292–304
Mohammadi SA, Prasanna BM (2003) Analysis of genetic diversity in crop plants—salient tools and considerations. Crop Sci 43:1235–1248
Motamayor JC, Risterucci AM, Lopez PA, Ortiz CF, Moreno A, Lanaud C (2002) Cacao domestication I: the origin of the cacao cultivated by the Mayas. Heredity 89:380–386
Motamayor JC, Risterucci AM, Heath M, Lanaud C (2003) Cacao domestication II: progenitor germplasm of the Trinitario cacao cultivar. Heredity 91:322–330
Motilal L, Butler D (2003) Verification in global cacao germplasm collections. Genet Resour Crop Evol 50:799–807
Nei M (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89:583–590
Nei M (1987) Molecular evolutionary genetics. Columbia University Press, NY
Ni J, Colowit PM, Mackill DJ (2002) Evaluation of genetic diversity in rice subspecies using microsatellite markers. Crop Sci 42:601–607
Pálsson S, Pamilo P (1999) The effects of deleterious mutations on linked, neutral variation in small populations. Genetics 153:475–483
Reif JC, Xia XC, Melchinger AE, Warburton ML, Hoisington DA, Beck D, Bohn M, Frisch M (2004) Genetic diversity determined within and among CIMMYT maize populations of tropical, subtropical, and temperate germplasm by SSR markers. Crop Sci 44:326–334
Risterucci AM, Grivet L, N’Goran JAK, Pieretti I, Flament MH, Lanud C (2000) A high-density linkage map of Theobroma cacao L. Theor Appl Genet 101:948–955
Rosenberg NA, Nordborg M (2002) Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat Rev Genet 3:380–390
Searle SR (1971) Linear models. Wiley, NY
Turnbull CJ, Butler DR, Cryer NC, Zhang D, Lanaud C, Daymond AJ, Ford CS, Wilkinson MJ, Hadley P (2004) Tackling mislabelling in cocoa germplasm collections. INGENIC Newsletter 9:8–11
Weir BS (1996) Genetic data analysis II. Sinauer, Sunderland, MA
Weir BS, Cockerham C (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358–1370
Wright S (1978) Evolution and genetics of populations, vol IV. The University of Chicago
Acknowledgement
We acknowledge Dr. James B. Holland for his critical review of this manuscript and suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cervantes-Martinez, C., Brown, J.S., Schnell, R. et al. A computer simulation study on the number of loci and trees required to estimate genetic variability in cacao (Theobroma cacao L.). Tree Genetics & Genomes 2, 152–164 (2006). https://doi.org/10.1007/s11295-006-0038-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11295-006-0038-0