Skip to main content
Log in

Measuring the representativeness of a germplasm collection

  • Original Paper
  • Published:
Biodiversity and Conservation Aims and scope Submit manuscript

Abstract

Many germplasm collections aim to preserve most of the genetic diversity present in a population so that the population could be regenerated, which provides genetic resources to ensure food security. This paper proposes a way to measure how well a germplasm collection achieve this goal. In the most common scenario, one has little information regarding the number and statistical distribution of alleles at every locus, and it is thus very difficult to measure the representativeness of the accession. Here, we show how to use samples of allelic diversity at a sample of loci to estimate the representativeness of an accession based on the coverage of a sample with point and interval estimates. Our approach avoids making unrealistic assumptions regarding the number of loci, the bounds for the number of alleles or their frequency distributions. Depending on the sampling scheme of a collection, we differentiate between absolute or relative coverage. Here, we demonstrate this methodology using data from the germplasm collection at the Leibniz Institute of Plant Genetics and Crop Plant Research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Brown AHD (1995) The core collection at the crossroads. In: Hodgkin T, Brown AHD, van Hintum TJL, Morales EAV (eds) Core collections of plant genetic resources. Wiley, Chichester, pp 3–19

    Google Scholar 

  • Chao A (1981) On estimating the probability of discovering a new species. Ann Stat 9(6):1339–1342

    Article  Google Scholar 

  • Chao A, Lee SM (1992) Estimating the number of classes via sample coverage. J Am Stat Assoc 87(417):210–217

    Article  Google Scholar 

  • Chao A, Lee SM (1993) Estimating population size for continuous-time capture-recapture models via sample coverage. Biom J 35(1):29–45

    Article  Google Scholar 

  • Darwin C (1866) On the origin of species by means of natural selection: or the preservation of favoured races in the struggle for life. John Murray, London

    Google Scholar 

  • Esty WW (1982) Confidence intervals for the coverage of low coverage samples. Ann Stat 10(1):190–196

    Article  Google Scholar 

  • Esty WW (1983) A normal limit law for a nonparametric estimator of the coverage of a random sample. Ann Stat 11(3):905–912

    Article  Google Scholar 

  • Esty W (1985) Estimation of the number of classes in a population and the coverage of a sample. Math Sci 10:41–50

    Google Scholar 

  • Esty WW (1986) The efficiency of good’s nonparametric coverage estimator. Ann Stat 14(3):1257–1260

    Article  Google Scholar 

  • Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40(3–4):237–264

    Article  Google Scholar 

  • Good I, Toulmin G (1956) The number of new species, and the increase in population coverage, when a sample is increased. Biometrika 43(1–2):45–63

    Article  Google Scholar 

  • Harris B (1959) Determining bounds on integrals with applications to cataloging problems. Ann Math Stat 30(2):521–548

    Article  Google Scholar 

  • Huang SP, Weir B (2001) Estimating the total number of alleles using a sample coverage method. Genetics 159(3):1365–1373

    CAS  PubMed  PubMed Central  Google Scholar 

  • Huang X, Börner A, Röder M, Ganal M (2002) Assessing genetic diversity of wheat (triticum aestivum l.) germplasm using microsatellite markers. Theor Appl Genet 105(5):699–707

    Article  CAS  PubMed  Google Scholar 

  • Knott M (1967) Models for cataloguing problems. Ann Math Stat 38(4):1255–1260

    Article  Google Scholar 

  • Lee SM, Chao A (1994) Estimating population size via sample coverage for closed capture-recapture models. Biometrics 50(1):88–97

    Article  CAS  PubMed  Google Scholar 

  • Lo SH (1992) From the species problem to a general coverage problem via a new interpretation. Ann Stat 20(2):1094–1109

    Article  Google Scholar 

  • Nei M (1973) Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci 70(12):3321–3323

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Robbins HE (1968) Estimating the total probability of the unobserved outcomes of an experiment. Ann Math Stat 39(1):256–257

    Article  Google Scholar 

  • Starr N (1979) Linear estimation of the probability of discovering a new species. Ann Stat 7(3):644–652

    Article  Google Scholar 

  • van Hintum TJ, Brown AHD, Spillane C, Hodkin T (2000) Core collections of plant genetic resources (IPGRI Technical Bulletin No. 3., Rome, Italy, 2000)

  • Zhang C-H, Zhang Z (2009) Asymptotic normality of a nonparametric estimator of sample coverage. Ann Stat 37:2582–2595

    Article  Google Scholar 

Download references

Acknowledgements

The author would like to thank Dr. Marion Roder, who kindly shared the data set used in Huang et al. (2002) paper.

Author contributions

Carlos Hernandez-Suarez developed the methodology, performed the simulations, wrote the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Hernandez-Suarez.

Ethics declarations

Conflict of interest

The author declares no conflict of interest.

Additional information

Communicated by David Hawksworth.

This article belongs to the Topical Collection: Ex-situ conservation.

Appendix: Proof of properties of the coverage of several populations

Appendix: Proof of properties of the coverage of several populations

  1. 1.

    If we select an individual at random from the population and then select one of its attributes X or Y, this attribute will be included in the sample with respective probabilities \(C_X\) and \(C_Y\). Because the selected attribute is equally likely to be X or Y, the probability that the attribute selected is in the sample is \((C_X + C_Y)/2\).

  2. 2.

    If two populations of sizes \(N_1\) and \(N_2\) are mixed and a sample of size n is taken from the mix, the probability that an individual selected at random from the mixed population is represented in the sample is defined as the coverage of the sample, this follows from the fact that a randomly selected individual from the mix of populations belongs to each initial population with respective probabilities \(f= N_1/(N_1 + N_2)\) and \(1-f=N_2/(N_1 + N_2)\). It follows that there is no need to mix both populations as long as each population is sampled with sample sizes \(n_1=N_1/(N_1+N_2)\) and \(n_2\), respectively.

  3. 3.

    Suppose we have two populations 1 and 2 of sizes \(N_1\) and \(N_2\), respectively, where \(N_1 = N_2\). Suppose we take a sample of size n from each population and let C represent the coverage of the mixed sample of size 2n. By property 2, C can be interpreted as the probability that a random individual selected from the mix of both populations is represented in the sample, i.e., the absolute coverage. Now suppose that the size of population 2 is increased by a factor of k, where \(k >1\), keeping the relative frequency of alleles fixed. Clearly, the previous interpretation of the coverage (absolute coverage) no longer holds because an individual selected randomly from the mixture of populations 1 and 2 is k times more likely to come from population 2. But if we can guarantee that the individual selected is equally likely to come from either population, then the probability that this individual is already represented in the sample is still C. The restriction imposed by requiring that it must be equally likely that the individual comes from either population defines the relative coverage. It follows that if we have two populations of general sizes \(N_1\) and \(N_2\), \(N_1 \ne N_2\), and take a sample of the same size n from each population, the coverage of the sample mix follows the definition of relative coverage.

  4. 4.

    This property follows from properties of random sampling.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hernandez-Suarez, C. Measuring the representativeness of a germplasm collection. Biodivers Conserv 27, 1471–1486 (2018). https://doi.org/10.1007/s10531-018-1504-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10531-018-1504-3

Keywords

Navigation