Skip to main content
Log in

A computer simulation study on the number of loci and trees required to estimate genetic variability in cacao (Theobroma cacao L.)

  • Original Paper
  • Published:
Tree Genetics & Genomes Aims and scope Submit manuscript

Abstract

Current methods for measures of genetic diversity of populations and germplasm collections are often based on statistics calculated from molecular markers. The objective of this study was to investigate the precision and accuracy of the most common estimators of genetic variability and population structure, as calculated from simple sequence repeat (SSR) marker data from cacao (Theobroma cacao L.). Computer simulated genomes of replicate populations were generated from initial allele frequencies estimated using SSR data from cacao accessions in a collection. The simulated genomes consisted of ten linkage groups of 100 cM in length each. Heterozygosity, gene diversity and the F statistics were studied as a function of number of loci and trees sampled. The results showed that relatively small random samples of trees were needed to achieve consistency in the observed estimations. In contrast, very large random samples of loci per linkage group were required to enable reliable inferences on the whole genome. Precision of estimates was increased by more than 50% with an increase in sample size from one to five loci per linkage group or 50 per genome, and up to 70% with ten loci per linkage group, or equivalently, 100 loci per genome. The use of fewer, highly polymorphic loci to analyze genetic variability led to estimates with substantially smaller variance but with an upward bias. Nevertheless, the relative differences of estimates among populations were generally consistent for the different levels of polymorphism considered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Ahmad R, Potter D, Southwick SM (2003) Genotyping of peach and nectarine cultivars with SSR and SRAP markers. J Am Soc Hortic Sci 128:898–903

    CAS  Google Scholar 

  • Brown JS, Schnell RJ, Motamayor JC, Lopes U, Kuhn DN, Borrone JW (2005) Resistance gene mapping for Witches’ Broom disease in Theobroma cacao L. in a F2 population using SSR markers and candidate genes. J Amer Soc Hortic Sci 130:366–373

    CAS  Google Scholar 

  • Clement D, Risterucci AM, Motamayor JC, N’Goran J, Lanaud C (2003) Mapping quantitative trait loci for bean traits and ovule number in Theobroma cacao L. Genome 46:103–111

    Article  PubMed  CAS  Google Scholar 

  • Cockerham CC (1969) Variance of gene frequencies. Evolution 23:72–84

    Article  Google Scholar 

  • Couch J, Zintel HA, Fritz P (1993) The genome of the tropical tree Theobroma cacao L. Mol Gen Genet 237:123–128

    Article  PubMed  CAS  Google Scholar 

  • Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman and Hall, NY

    Google Scholar 

  • Fuerst PA, Chakraborty R, Nei M (1977) Statistical studies on protein polymorphism in natural populations: I. Distribution of single locus heterozygosity. Genetics 86:455–483

    PubMed  CAS  Google Scholar 

  • Gao Z-H, Shen ZJ, Han Z-H, Fang JG, Zhang Y-M, Zhang Z (2004) Microsatellite markers and genetic diversity in Japanese apricot (Prunus mume). HortScience 39:1571–1574

    CAS  Google Scholar 

  • Gilks WR, Richardson S, Spiegelhalter DJ (1996) Markov Chain Monte Carlo in practice. Chapman and Hall, London

    Google Scholar 

  • Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325–338

    Article  Google Scholar 

  • Haldane JBS (1919) The combination of linkage values, and the calculation of distances between the loci of linked factors. J Genet 8:299–309

    Google Scholar 

  • Haymes KM, Ibrahim IA, Mischke S, Saunders J (2004) Rapid isolation of DNA from chocolate and date palm tree crops. J Agric Food Chem 52:5456–5462

    Article  PubMed  CAS  Google Scholar 

  • Kalinowski ST (2002a) How many alleles per locus should be used to estimate genetic distances? Heredity 88:62–65

    Article  PubMed  CAS  Google Scholar 

  • Kalinowski ST (2002b) Evolutionary and statistical properties of three genetic distances. Mol Ecol 11:1263–1273

    Article  PubMed  Google Scholar 

  • Kalinowski ST (2005) Do polymorphic loci require large sample sizes to estimate genetic distances? Heredity 94:33–36

    Article  PubMed  CAS  Google Scholar 

  • Kimura M, Crow JF (1964) The number of alleles that can be maintained in a finite population. Genetics 49:725–738

    PubMed  CAS  Google Scholar 

  • Lanaud C, Hamon P, Duperray C (1992) Estimation of the nuclear DNA content of Theobroma cacao L. by flow cytometry. Café Cacao 36:3–8

    CAS  Google Scholar 

  • Lanaud C, Risterucci AM, N’Goran AJK, Clement D, Flament MH, Laurent V, Falque M (1995) A genetic linkage map of Theobroma cacao L. Theor Appl Genet 91:987–993

    Article  CAS  Google Scholar 

  • Lanaud C, Risterucci AM, Pieretti I, Falque M, Bouet A, Lagoda PJL (1999) Isolation and characterization of microsatellites in Theobroma cacao L. Mol Ecol 8:2141–2152

    Article  PubMed  CAS  Google Scholar 

  • Laurent V, Risterucci AM, Lanaud C (1994) Genetic diversity in cocoa revealed by cDNA probes. Theor Appl Genet 88:193–198

    Article  CAS  Google Scholar 

  • Lerceteau E, Robert T, Pétiard V, Crouzillat D (1997) Evaluation of the extent of genetic variability among Theobroma cacao using RAPD and RFLP markers. Theor Appl Genet 95:10–19

    Article  CAS  Google Scholar 

  • Littell RC, Milliken GA, Stroup WW, Wolfinger RD (1996) SAS System for mixed models. SAS Institute, Cary, NC

    Google Scholar 

  • Liu K, Goodman M, Muse S, Smith JS, Buckler E, Doebley J (2003) Genetic structure and diversity among maize inbred lines as inferred from DNA microsatellites. Genetics 165:2117–2128

    PubMed  CAS  Google Scholar 

  • MacLeod AK, Haley CS, Woolliams JA, Stam P (2005) Marker densities and the mapping of ancestral junctions. Genet Res Camb 85:69–79

    Google Scholar 

  • Meinguet J (1979) Multivariate interpolation at arbitrary points made simple. J Appl Math Phys 30:292–304

    Article  Google Scholar 

  • Mohammadi SA, Prasanna BM (2003) Analysis of genetic diversity in crop plants—salient tools and considerations. Crop Sci 43:1235–1248

    Google Scholar 

  • Motamayor JC, Risterucci AM, Lopez PA, Ortiz CF, Moreno A, Lanaud C (2002) Cacao domestication I: the origin of the cacao cultivated by the Mayas. Heredity 89:380–386

    Article  PubMed  CAS  Google Scholar 

  • Motamayor JC, Risterucci AM, Heath M, Lanaud C (2003) Cacao domestication II: progenitor germplasm of the Trinitario cacao cultivar. Heredity 91:322–330

    Article  PubMed  CAS  Google Scholar 

  • Motilal L, Butler D (2003) Verification in global cacao germplasm collections. Genet Resour Crop Evol 50:799–807

    Article  Google Scholar 

  • Nei M (1978) Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89:583–590

    PubMed  Google Scholar 

  • Nei M (1987) Molecular evolutionary genetics. Columbia University Press, NY

    Google Scholar 

  • Ni J, Colowit PM, Mackill DJ (2002) Evaluation of genetic diversity in rice subspecies using microsatellite markers. Crop Sci 42:601–607

    CAS  Google Scholar 

  • Pálsson S, Pamilo P (1999) The effects of deleterious mutations on linked, neutral variation in small populations. Genetics 153:475–483

    PubMed  Google Scholar 

  • Reif JC, Xia XC, Melchinger AE, Warburton ML, Hoisington DA, Beck D, Bohn M, Frisch M (2004) Genetic diversity determined within and among CIMMYT maize populations of tropical, subtropical, and temperate germplasm by SSR markers. Crop Sci 44:326–334

    CAS  Google Scholar 

  • Risterucci AM, Grivet L, N’Goran JAK, Pieretti I, Flament MH, Lanud C (2000) A high-density linkage map of Theobroma cacao L. Theor Appl Genet 101:948–955

    Article  CAS  Google Scholar 

  • Rosenberg NA, Nordborg M (2002) Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat Rev Genet 3:380–390

    Article  PubMed  CAS  Google Scholar 

  • Searle SR (1971) Linear models. Wiley, NY

    Google Scholar 

  • Turnbull CJ, Butler DR, Cryer NC, Zhang D, Lanaud C, Daymond AJ, Ford CS, Wilkinson MJ, Hadley P (2004) Tackling mislabelling in cocoa germplasm collections. INGENIC Newsletter 9:8–11

    Google Scholar 

  • Weir BS (1996) Genetic data analysis II. Sinauer, Sunderland, MA

    Google Scholar 

  • Weir BS, Cockerham C (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358–1370

    Article  Google Scholar 

  • Wright S (1978) Evolution and genetics of populations, vol IV. The University of Chicago

Download references

Acknowledgement

We acknowledge Dr. James B. Holland for his critical review of this manuscript and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cuauhtemoc Cervantes-Martinez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cervantes-Martinez, C., Brown, J.S., Schnell, R. et al. A computer simulation study on the number of loci and trees required to estimate genetic variability in cacao (Theobroma cacao L.). Tree Genetics & Genomes 2, 152–164 (2006). https://doi.org/10.1007/s11295-006-0038-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11295-006-0038-0

Keywords

Navigation