Abstract
Genetic diversity parameters are used by plant breeders to develop efficient genetic resources sampling and conservation strategies. Extending previous developments on the use of ANOVA on allele frequencies in a fixed set of populations, we show that this approach allows unbiased estimation of diversity parameters, including Nei’s diversity parameters, HS, the within-population diversity; DST, the between-population differentiation; HT, the total gene diversity; and other related parameters well suited for guiding conservation decisions. We consider two cases: selfing plants and outcrossing plants. For outcrossing plants, this approach also allows the estimation of the average frequency of heterozygotes (H0) and average departure of populations from a random mating equilibrium. These unbiased ANOVA estimators correspond to those derived by Nei and Chesser (Ann Hum Genet 47:253–259, 1983) by using properties related to the multinomial sampling of genotypes. With an equal number of individuals sampled per population, we first developed analyses of variation for each allele at one locus. Then, considering the whole set of alleles, we show the correspondence between the sum of the variances in allele frequencies over the alleles and Nei’s within- and between-population diversities. Considering large populations leads to Nei’s relationship, HT = HS + DST, which is a decomposition of the total variance in allele frequencies into within- and between-population variance components, variance meaning the sum of the variances of each allele over the whole set of alleles. Finally, we use theoretical results of the ANOVA approach to consider a genetic resources conservation design with only one individual per population, which allows Nei’s total gene diversity to be maintained.
Similar content being viewed by others
References
Brown AHD, Munday J (1982) Population-genetic structure and optimal sampling of land races of barley from Iran. Genetica 58:85–96
Cockerham CC (1969) Variance of gene frequencies. Evolution 23:72–84
Cockerham CC (1973) Analyses of gene frequencies. Genetics 74:679–700
Cockerham CC, Weir B (1986) Estimation of inbreeding parameters in stratified populations. Ann Hum Genet 50:271–281
Excoffier L, Heckel G (2006) Computer programs for population genetics data analysis: a survival guide. Nat Rev Genet 7:745–758
Gouesnard B, Bataillon TM, Decoux G, Rozale C, Schoen DJ, David JL (2001) MSTRAT: an algorithm for building germ plasm core collections by maximizing allelic or phenotypic richness. J Hered 92:93–94
Gregorius HR (1988) The meaning of genetic variation within and between subpopulations. Theor Appl Genet 76:947–951
Holsinger KE, Weir B (2009) Genetics in geographically structured populations: defining, estimating and interpreting FST. Nat Rev Genet 10:639–650
Lefèvre F, Gallais A (2020) Partitioning expected heterozygosity in subdivided populations: some misuses of Nei’s decomposition and an alternative probabilistic approach. Mol Ecol 29:2957–2962
Nei M (1973) Analysis of gene diversity in subdivided populations. Proc Nat Acad Sci USA 70(12):3321–3323
Nei M (1986) Definition and estimation of fixation indices. Evolution 40:643–645
Nei M, Chakravarti A (1977) Drift variances of FST and GST statistics obtained from a finite number of isolated populations. Theor Pop Biol 11:307–325
Nei M, Chesser RK (1983) Estimation of fixation indices and gene diversities. Ann Hum Genet 47:253–259
Ollivier L, Foulley JL (2005) Aggregate diversity: new approach combining within- and between-breed genetic diversity. Livest Prod Sci 95:247–254
Petit RJ, El Moussadik A, Pons O (1998) Identifying populations for conservation on the basis of genetic markers. Conserv Biol 12:844–855
Pons O, Chaouche K (1995) Estimation, variance and optimal sampling of gene diversity II. Diploid Locus Theor Appl Genet 91:122–130
Pons O, Petit RJ (1995) Estimation, variance and optimal sampling of gene diversity I. Haploid Locus Theor Appl Genet 90:462–470
Scheffé H (1959) The analysis of variance. Wiley, New York, p 477p
Van Treuren R, Engels J, Hoekstra R, Van Hintum T (2009) Optimization of the composition of crop collections for ex situ conservation. Plant Genet Resour 7:185–193
Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358–1370
Weir BS (1996) Genetic data analysis II. Methods for discrete population genetic data. Sinauer associates Inc Pub. Sunderland, Massachussetts, 445 p
Wright S (1951) The genetical structure of populations. Ann Eugen 15:323–354
Wright S (1965) The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution 19:395–420
Yonezawa K (1985) A definition of the optimal allocation of effort in conservation of plant genetic resources with application to sample size determination for field collection. Euphytica 34:345–354
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Contributions
AG and FL have contributed to the content and the writing of the article.
Corresponding author
Ethics declarations
Conflict of interest
André Gallais and François Lefèvre declare that they have no conflict of interest.
Consent to participate
Both authors agreed on the research and the manuscript.
Consent for publication
Both authors consent to submit the manuscript for publication in Euphytica.
Ethical approval
This article does not contain any studies with human participants or animal performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Derivation of the expected mean squares for the ANOVA on allele frequencies
Appendix: Derivation of the expected mean squares for the ANOVA on allele frequencies
Case of selfing plants
Using the indicator variable xij such that xij = 1 for an individual j of the population i if a given allele k is present and xij = 0 otherwise, according to the model (1), the sum of squares (SSk) of the corresponding ANOVA for a given allele k are the following:
-
- SSk between populations (pop) = (SSpop)k = \(n\sum\nolimits_{i} {x_{i.}^{2} } - ns\;x_{..}^{2}\) with (s−1) degrees of freedom (df).
-
- SSk within populations (ind/pop) = (SSind/pop)k = \(\sum\nolimits_{ij} {x_{ij}^{2} } - n\sum\nolimits_{i} {x_{i.}^{2} }\) with s(n − 1) degrees of freedom,
-
- SSk total = (SSTot)k = \(\sum\nolimits_{ij} {x_{ij}^{2} } - ns\;x_{..}^{2}\) with (sn − 1) degrees of freedom.
Corresponding expected mean squares can be derived for a fixed set of populations and for a given allele k, be E(MS)k = E(SSk/df), i.e. respectively, E(MSpop)k, E(MSind/pop)k, and E(MSTot)k, denoting by pi the frequency of the considered allele in a population i and taking into account the following expressions: Ej(xij) = pi, Eij(xij) = \(E_{ij} (x_{ij}^{2} )\) = \(E_{i} (x_{i.} ) = E(x_{..} ) = \sum\nolimits_{i} {p_{i} } /s = \overline{p}\).
Expectation E(MSind/pop)k = \(\sigma_{w(k)}^{2}\),
For a given allele k, we can write
E(MSind/pop)k = \(E\left[\sum\nolimits_{ij} {x_{ij}^{2} } - \sum\nolimits_{i} {(\sum\nolimits_{j} {x_{ij} )^{2} } } /n\right]/s(n - 1)\)
or with the introduction of the subscript k for the considered allele
Expectation E(MSpop)k = E[SSBk/(s − 1)]
\(E(\sum\nolimits_{ij} {x_{ij} } )^{2} /ns = \left[E(\sum\nolimits_{ij} {x_{ij}^{2} } ) + E(\sum\nolimits_{i} {\sum\nolimits_{jj^{\prime}} {x_{ij} } } x_{ij^{\prime}} ) + E(\sum\nolimits_{ii^{\prime}} {\sum\nolimits_{jj^{\prime}} {x_{ij} } } x_{i^{\prime}j^{\prime}} )\right]/ns\) with \(i^{\prime} \ne i,\)
and then introducing the subscript k for the considered allele:
and as \(D_{ST}^{\prime } = \,\left[(s - 1)\sum\nolimits_{i} {p_{ik}^{2} } - \sum\nolimits_{i,i^{\prime} \ne i} {p_{ik} } p{}_{{i^{\prime } k}}\right]/s(s - 1)\), \(E(MS_{pop} )_{k} = \sigma_{w(k)}^{2} + n\;D_{ST(k)}^{\prime } = H_{S(k)} + n\;D_{ST(k)}^{\prime }\), which corresponds to the result given by Weir (1996) when the sample size of each population is constant (ni = n).
Case of outcrossing plants
Using the indicator variable xijt, indexing with t the alleles at a locus within an individual j from a population i, such that xijt = 1 if a given allele is present and xijt = 0 otherwise, the table for the analysis of variation of this indicator variable, associated to model (9) is given Table 1.
To derive the expected mean squares E(MS) for a fixed set of populations we derive the expectations of four terms: \(E\left[\sum\nolimits_{ijt} {x_{ijt}^{2} } \right]\), \(E\left[2\sum\nolimits_{ij} {x_{ij.}^{2} } \right]\), \(E\left[2n\sum\nolimits_{i} {x_{i..}^{2} } \right]\) et \(E\left[2ns\;x_{...}^{2} \right]\), noting that we have for the indicator variable x:\(E_{ijt} (x_{ijt} ) = E_{ijt} (x_{ijt}^{2} ) = \overline{p} = \sum\nolimits_{i} {p_{i} } /s\), \(E_{jt} (x_{ijt} ) = p_{i}.\)
with t \(\ne\) t', because \( E_{{tt^{\prime } }} (x_{{ijt}} x_{{ijt^{\prime } }} ) = P_{i} \), Pi being the frequency of homozygous genotypes for the considered allele in the population i;
\(E[\sum\nolimits_{jt} {x_{ijt} } ]^{2} = E[\sum\nolimits_{jt} {x_{ijt}^{2} ]} + E[\sum\nolimits_{j} {\sum\nolimits_{tt^{\prime}} {x_{ijt} } } x_{ijt^{\prime}} ] + E[\sum\nolimits_{jj^{\prime}} {\sum\nolimits_{tt^{\prime}} {x_{ijt} } } x_{ij^{\prime}t^{\prime}} ]\), with j \(\ne\) j', t \(\ne\) t’,
\(E\left[\sum\nolimits_{jt} {x_{ijt} } \right]^{2} = 2n\,p_{i}\),
\(E\left[\sum\nolimits_{j} {\sum\nolimits_{kk^{\prime}} {x_{ijt} } } x_{ijt^{\prime}} \right] = 2n\,P_{i}\),
\(E\left[\sum\nolimits_{jj^{\prime}} {\sum\nolimits_{tt^{\prime}} {x_{ijt} } } x_{ij^{\prime}t^{\prime}} \right] = 4n(n - 1)p_{i}^{2}\), and then it results:
with i \(\ne\) i', j \(\ne\) j', t \(\ne\) t';
\(E\left[\sum\nolimits_{ijt} {x_{ijt}^{2} } \right] = 2n\,\sum\nolimits_{i} {p{}_{i}}\),
\(E\left[\sum\nolimits_{ij} {\sum\nolimits_{tt^{\prime}} {x_{ijt} } } x_{ijt^{\prime}} \right] = 2n\,\sum\nolimits_{i} {P_{i} }\),
\(E\left[\sum\nolimits_{i} {\sum\nolimits_{jj^{\prime}tt^{\prime}} {x_{ijt} } } x_{ij^{\prime}t^{\prime}} \right] = 4sn(n - 1)E_{i} \left[E_{jt} (x_{ijt} )\,E_{j^{\prime}t^{\prime}} (x_{ij^{\prime}t^{\prime}} )\right] = 4n(n - 1)\sum\nolimits_{i} {p_{i}^{2} } E\left[\sum\nolimits_{ii^{\prime}} {\sum\nolimits_{jj^{\prime}tt^{\prime}} {x_{ijt} } } x_{i^{\prime}j^{\prime}t^{\prime}} \right] = 4n^{2} \sum\nolimits_{ii^{\prime}} {E_{jt} (x_{ijt} )E_{j^{\prime}t^{\prime}} (x_{i^{\prime}j^{\prime}t^{\prime}} )} = 4n^{2} \sum\nolimits_{ii^{\prime}} {p_{i} } p_{i^{\prime}}\), then, it results:
Adding the subscript k to pi and Pi for the frequency of allele k and the frequency of the homozygous genotype for this allele in population i, we get for the expected mean square E(MS) = E(SS)/df, in terms of pik and Pik:
Summing over all alleles from (A13) we can write
then with \(\sum\nolimits_{ik} {p_{ik} } (1 - p_{ik} )/s = H_{S}\), \(\sum\nolimits_{ik} {p_{ik}^{2} } /s = 1 - H_{S}\) and \(\sum\nolimits_{ik} {P_{ik}^{{}} } /s = 1 - H_{0}\), \(H_{0}\) being the average frequency of heterozygotes across populations, it results
Noting that \(\sum\nolimits_{ik} {P_{ik} /s - \sum\nolimits_{ik} {p_{ik}^{2} /s} = } \,H{}_{S} - H_{0} = \overline{\Delta }\) the global departure from random mating equilibrium, from (A12) we can write
Finally, by noting that \(D_{ST}^{^{\prime}} = [(s - 1)\sum\nolimits_{ik} {p_{ik}^{2} } - \sum\nolimits_{ii^{\prime}k} {p_{ik} p_{i^{\prime}k} ]/\;} s(s - 1)\), from (A11) we can write in terms of Nei’s parameters the sum over all alleles of the between-population expected mean squares:
Rights and permissions
About this article
Cite this article
Gallais, A., Lefèvre, F. ANOVA for estimating Nei’s diversity and related parameters in a fixed set of populations with an application in genetic resources conservation. Euphytica 217, 192 (2021). https://doi.org/10.1007/s10681-021-02904-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10681-021-02904-x