Measuring the representativeness of a germplasm collection

Hernandez-Suarez, Carlos

doi:10.1007/s10531-018-1504-3

Measuring the representativeness of a germplasm collection

Original Paper
Published: 20 January 2018

Volume 27, pages 1471–1486, (2018)
Cite this article

Biodiversity and Conservation Aims and scope Submit manuscript

Carlos Hernandez-Suarez ORCID: orcid.org/0000-0003-2216-0905¹

314 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Many germplasm collections aim to preserve most of the genetic diversity present in a population so that the population could be regenerated, which provides genetic resources to ensure food security. This paper proposes a way to measure how well a germplasm collection achieve this goal. In the most common scenario, one has little information regarding the number and statistical distribution of alleles at every locus, and it is thus very difficult to measure the representativeness of the accession. Here, we show how to use samples of allelic diversity at a sample of loci to estimate the representativeness of an accession based on the coverage of a sample with point and interval estimates. Our approach avoids making unrealistic assumptions regarding the number of loci, the bounds for the number of alleles or their frequency distributions. Depending on the sampling scheme of a collection, we differentiate between absolute or relative coverage. Here, we demonstrate this methodology using data from the germplasm collection at the Leibniz Institute of Plant Genetics and Crop Plant Research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The impact of sample selection strategies on genetic diversity and representativeness in germplasm bank collections

Article Open access 27 November 2019

Genebank Conservation of Germplasm Collected from Wild Species

ShinyCore: An R/Shiny program for establishing core collection based on single nucleotide polymorphism data

Article Open access 11 October 2023

References

Brown AHD (1995) The core collection at the crossroads. In: Hodgkin T, Brown AHD, van Hintum TJL, Morales EAV (eds) Core collections of plant genetic resources. Wiley, Chichester, pp 3–19
Google Scholar
Chao A (1981) On estimating the probability of discovering a new species. Ann Stat 9(6):1339–1342
Article Google Scholar
Chao A, Lee SM (1992) Estimating the number of classes via sample coverage. J Am Stat Assoc 87(417):210–217
Article Google Scholar
Chao A, Lee SM (1993) Estimating population size for continuous-time capture-recapture models via sample coverage. Biom J 35(1):29–45
Article Google Scholar
Darwin C (1866) On the origin of species by means of natural selection: or the preservation of favoured races in the struggle for life. John Murray, London
Google Scholar
Esty WW (1982) Confidence intervals for the coverage of low coverage samples. Ann Stat 10(1):190–196
Article Google Scholar
Esty WW (1983) A normal limit law for a nonparametric estimator of the coverage of a random sample. Ann Stat 11(3):905–912
Article Google Scholar
Esty W (1985) Estimation of the number of classes in a population and the coverage of a sample. Math Sci 10:41–50
Google Scholar
Esty WW (1986) The efficiency of good’s nonparametric coverage estimator. Ann Stat 14(3):1257–1260
Article Google Scholar
Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40(3–4):237–264
Article Google Scholar
Good I, Toulmin G (1956) The number of new species, and the increase in population coverage, when a sample is increased. Biometrika 43(1–2):45–63
Article Google Scholar
Harris B (1959) Determining bounds on integrals with applications to cataloging problems. Ann Math Stat 30(2):521–548
Article Google Scholar
Huang SP, Weir B (2001) Estimating the total number of alleles using a sample coverage method. Genetics 159(3):1365–1373
CAS PubMed PubMed Central Google Scholar
Huang X, Börner A, Röder M, Ganal M (2002) Assessing genetic diversity of wheat (triticum aestivum l.) germplasm using microsatellite markers. Theor Appl Genet 105(5):699–707
Article CAS PubMed Google Scholar
Knott M (1967) Models for cataloguing problems. Ann Math Stat 38(4):1255–1260
Article Google Scholar
Lee SM, Chao A (1994) Estimating population size via sample coverage for closed capture-recapture models. Biometrics 50(1):88–97
Article CAS PubMed Google Scholar
Lo SH (1992) From the species problem to a general coverage problem via a new interpretation. Ann Stat 20(2):1094–1109
Article Google Scholar
Nei M (1973) Analysis of gene diversity in subdivided populations. Proc Natl Acad Sci 70(12):3321–3323
Article CAS PubMed PubMed Central Google Scholar
Robbins HE (1968) Estimating the total probability of the unobserved outcomes of an experiment. Ann Math Stat 39(1):256–257
Article Google Scholar
Starr N (1979) Linear estimation of the probability of discovering a new species. Ann Stat 7(3):644–652
Article Google Scholar
van Hintum TJ, Brown AHD, Spillane C, Hodkin T (2000) Core collections of plant genetic resources (IPGRI Technical Bulletin No. 3., Rome, Italy, 2000)
Zhang C-H, Zhang Z (2009) Asymptotic normality of a nonparametric estimator of sample coverage. Ann Stat 37:2582–2595
Article Google Scholar

Download references

Acknowledgements

The author would like to thank Dr. Marion Roder, who kindly shared the data set used in Huang et al. (2002) paper.

Author contributions

Carlos Hernandez-Suarez developed the methodology, performed the simulations, wrote the manuscript.

Author information

Authors and Affiliations

Facultad de Ciencias, Universidad de Colima, Bernal Diaz del Castillo 340, Colima, México
Carlos Hernandez-Suarez

Authors

Carlos Hernandez-Suarez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos Hernandez-Suarez.

Ethics declarations

Conflict of interest

The author declares no conflict of interest.

Additional information

Communicated by David Hawksworth.

This article belongs to the Topical Collection: Ex-situ conservation.

Appendix: Proof of properties of the coverage of several populations

1.
If we select an individual at random from the population and then select one of its attributes X or Y, this attribute will be included in the sample with respective probabilities \(C_X\) and \(C_Y\). Because the selected attribute is equally likely to be X or Y, the probability that the attribute selected is in the sample is \((C_X + C_Y)/2\).
2.
If two populations of sizes \(N_1\) and \(N_2\) are mixed and a sample of size n is taken from the mix, the probability that an individual selected at random from the mixed population is represented in the sample is defined as the coverage of the sample, this follows from the fact that a randomly selected individual from the mix of populations belongs to each initial population with respective probabilities \(f= N_1/(N_1 + N_2)\) and \(1-f=N_2/(N_1 + N_2)\). It follows that there is no need to mix both populations as long as each population is sampled with sample sizes \(n_1=N_1/(N_1+N_2)\) and \(n_2\), respectively.
3.
Suppose we have two populations 1 and 2 of sizes \(N_1\) and \(N_2\), respectively, where \(N_1 = N_2\). Suppose we take a sample of size n from each population and let C represent the coverage of the mixed sample of size 2n. By property 2, C can be interpreted as the probability that a random individual selected from the mix of both populations is represented in the sample, i.e., the absolute coverage. Now suppose that the size of population 2 is increased by a factor of k, where \(k >1\), keeping the relative frequency of alleles fixed. Clearly, the previous interpretation of the coverage (absolute coverage) no longer holds because an individual selected randomly from the mixture of populations 1 and 2 is k times more likely to come from population 2. But if we can guarantee that the individual selected is equally likely to come from either population, then the probability that this individual is already represented in the sample is still C. The restriction imposed by requiring that it must be equally likely that the individual comes from either population defines the relative coverage. It follows that if we have two populations of general sizes \(N_1\) and \(N_2\), \(N_1 \ne N_2\), and take a sample of the same size n from each population, the coverage of the sample mix follows the definition of relative coverage.
4.
This property follows from properties of random sampling.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hernandez-Suarez, C. Measuring the representativeness of a germplasm collection. Biodivers Conserv 27, 1471–1486 (2018). https://doi.org/10.1007/s10531-018-1504-3

Download citation

Received: 03 May 2017
Revised: 07 January 2018
Accepted: 15 January 2018
Published: 20 January 2018
Issue Date: May 2018
DOI: https://doi.org/10.1007/s10531-018-1504-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Measuring the representativeness of a germplasm collection

Abstract

Access this article

Similar content being viewed by others

The impact of sample selection strategies on genetic diversity and representativeness in germplasm bank collections

Genebank Conservation of Germplasm Collected from Wild Species

ShinyCore: An R/Shiny program for establishing core collection based on single nucleotide polymorphism data

References

Acknowledgements

Author contributions

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Appendix: Proof of properties of the coverage of several populations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Measuring the representativeness of a germplasm collection

Abstract

Access this article

Similar content being viewed by others

The impact of sample selection strategies on genetic diversity and representativeness in germplasm bank collections

Genebank Conservation of Germplasm Collected from Wild Species

ShinyCore: An R/Shiny program for establishing core collection based on single nucleotide polymorphism data

References

Acknowledgements

Author contributions

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Appendix: Proof of properties of the coverage of several populations

Appendix: Proof of properties of the coverage of several populations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation