Skip to main content
Log in

ANOVA for estimating Nei’s diversity and related parameters in a fixed set of populations with an application in genetic resources conservation

  • Published:
Euphytica Aims and scope Submit manuscript

Abstract

Genetic diversity parameters are used by plant breeders to develop efficient genetic resources sampling and conservation strategies. Extending previous developments on the use of ANOVA on allele frequencies in a fixed set of populations, we show that this approach allows unbiased estimation of diversity parameters, including Nei’s diversity parameters, HS, the within-population diversity; DST, the between-population differentiation; HT, the total gene diversity; and other related parameters well suited for guiding conservation decisions. We consider two cases: selfing plants and outcrossing plants. For outcrossing plants, this approach also allows the estimation of the average frequency of heterozygotes (H0) and average departure of populations from a random mating equilibrium. These unbiased ANOVA estimators correspond to those derived by Nei and Chesser (Ann Hum Genet 47:253–259, 1983) by using properties related to the multinomial sampling of genotypes. With an equal number of individuals sampled per population, we first developed analyses of variation for each allele at one locus. Then, considering the whole set of alleles, we show the correspondence between the sum of the variances in allele frequencies over the alleles and Nei’s within- and between-population diversities. Considering large populations leads to Nei’s relationship, HT = HS + DST, which is a decomposition of the total variance in allele frequencies into within- and between-population variance components, variance meaning the sum of the variances of each allele over the whole set of alleles. Finally, we use theoretical results of the ANOVA approach to consider a genetic resources conservation design with only one individual per population, which allows Nei’s total gene diversity to be maintained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Brown AHD, Munday J (1982) Population-genetic structure and optimal sampling of land races of barley from Iran. Genetica 58:85–96

    Article  Google Scholar 

  • Cockerham CC (1969) Variance of gene frequencies. Evolution 23:72–84

    Article  Google Scholar 

  • Cockerham CC (1973) Analyses of gene frequencies. Genetics 74:679–700

    Article  CAS  Google Scholar 

  • Cockerham CC, Weir B (1986) Estimation of inbreeding parameters in stratified populations. Ann Hum Genet 50:271–281

    Article  CAS  Google Scholar 

  • Excoffier L, Heckel G (2006) Computer programs for population genetics data analysis: a survival guide. Nat Rev Genet 7:745–758

    Article  CAS  Google Scholar 

  • Gouesnard B, Bataillon TM, Decoux G, Rozale C, Schoen DJ, David JL (2001) MSTRAT: an algorithm for building germ plasm core collections by maximizing allelic or phenotypic richness. J Hered 92:93–94

    Article  CAS  Google Scholar 

  • Gregorius HR (1988) The meaning of genetic variation within and between subpopulations. Theor Appl Genet 76:947–951

    Article  CAS  Google Scholar 

  • Holsinger KE, Weir B (2009) Genetics in geographically structured populations: defining, estimating and interpreting FST. Nat Rev Genet 10:639–650

    Article  CAS  Google Scholar 

  • Lefèvre F, Gallais A (2020) Partitioning expected heterozygosity in subdivided populations: some misuses of Nei’s decomposition and an alternative probabilistic approach. Mol Ecol 29:2957–2962

    Article  Google Scholar 

  • Nei M (1973) Analysis of gene diversity in subdivided populations. Proc Nat Acad Sci USA 70(12):3321–3323

    Article  CAS  Google Scholar 

  • Nei M (1986) Definition and estimation of fixation indices. Evolution 40:643–645

    Article  Google Scholar 

  • Nei M, Chakravarti A (1977) Drift variances of FST and GST statistics obtained from a finite number of isolated populations. Theor Pop Biol 11:307–325

    Article  CAS  Google Scholar 

  • Nei M, Chesser RK (1983) Estimation of fixation indices and gene diversities. Ann Hum Genet 47:253–259

    Article  CAS  Google Scholar 

  • Ollivier L, Foulley JL (2005) Aggregate diversity: new approach combining within- and between-breed genetic diversity. Livest Prod Sci 95:247–254

    Article  Google Scholar 

  • Petit RJ, El Moussadik A, Pons O (1998) Identifying populations for conservation on the basis of genetic markers. Conserv Biol 12:844–855

    Article  Google Scholar 

  • Pons O, Chaouche K (1995) Estimation, variance and optimal sampling of gene diversity II. Diploid Locus Theor Appl Genet 91:122–130

    Article  CAS  Google Scholar 

  • Pons O, Petit RJ (1995) Estimation, variance and optimal sampling of gene diversity I. Haploid Locus Theor Appl Genet 90:462–470

    Article  CAS  Google Scholar 

  • Scheffé H (1959) The analysis of variance. Wiley, New York, p 477p

    Google Scholar 

  • Van Treuren R, Engels J, Hoekstra R, Van Hintum T (2009) Optimization of the composition of crop collections for ex situ conservation. Plant Genet Resour 7:185–193

    Article  Google Scholar 

  • Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358–1370

    CAS  PubMed  Google Scholar 

  • Weir BS (1996) Genetic data analysis II. Methods for discrete population genetic data. Sinauer associates Inc Pub. Sunderland, Massachussetts, 445 p

  • Wright S (1951) The genetical structure of populations. Ann Eugen 15:323–354

    Article  CAS  Google Scholar 

  • Wright S (1965) The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution 19:395–420

    Article  Google Scholar 

  • Yonezawa K (1985) A definition of the optimal allocation of effort in conservation of plant genetic resources with application to sample size determination for field collection. Euphytica 34:345–354

    Article  Google Scholar 

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Contributions

AG and FL have contributed to the content and the writing of the article.

Corresponding author

Correspondence to André Gallais.

Ethics declarations

Conflict of interest

André Gallais and François Lefèvre declare that they have no conflict of interest.

Consent to participate

Both authors agreed on the research and the manuscript.

Consent for publication

Both authors consent to submit the manuscript for publication in Euphytica.

Ethical approval

This article does not contain any studies with human participants or animal performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Derivation of the expected mean squares for the ANOVA on allele frequencies

Appendix: Derivation of the expected mean squares for the ANOVA on allele frequencies

Case of selfing plants

Using the indicator variable xij such that xij = 1 for an individual j of the population i if a given allele k is present and xij = 0 otherwise, according to the model (1), the sum of squares (SSk) of the corresponding ANOVA for a given allele k are the following:

  • - SSk between populations (pop) = (SSpop)k = \(n\sum\nolimits_{i} {x_{i.}^{2} } - ns\;x_{..}^{2}\) with (s−1) degrees of freedom (df).

  • - SSk within populations (ind/pop) = (SSind/pop)k = \(\sum\nolimits_{ij} {x_{ij}^{2} } - n\sum\nolimits_{i} {x_{i.}^{2} }\) with s(n − 1) degrees of freedom,

  • - SSk total = (SSTot)k = \(\sum\nolimits_{ij} {x_{ij}^{2} } - ns\;x_{..}^{2}\) with (sn − 1) degrees of freedom.

Corresponding expected mean squares can be derived for a fixed set of populations and for a given allele k, be E(MS)k = E(SSk/df), i.e. respectively, E(MSpop)k, E(MSind/pop)k, and E(MSTot)k, denoting by pi the frequency of the considered allele in a population i and taking into account the following expressions: Ej(xij) = pi, Eij(xij) = \(E_{ij} (x_{ij}^{2} )\) = \(E_{i} (x_{i.} ) = E(x_{..} ) = \sum\nolimits_{i} {p_{i} } /s = \overline{p}\).

Expectation E(MSind/pop)k = \(\sigma_{w(k)}^{2}\),

For a given allele k, we can write

E(MSind/pop)k = \(E\left[\sum\nolimits_{ij} {x_{ij}^{2} } - \sum\nolimits_{i} {(\sum\nolimits_{j} {x_{ij} )^{2} } } /n\right]/s(n - 1)\)

$$ E\left( {\sum\nolimits_{ij} {x_{ij}^{2} } } \right) = \sum\nolimits_{i} {n\;E(x_{ij}^{2} ) = \;} n\sum\nolimits_{i} {p_{i} } \,\,\,{\text{because}}\,\,\,E_{j} (x_{ij}^{2} ) = E_{j} (x_{ij} ) = p_{i} , $$
(A1)
$$ E\left[\sum\nolimits_{i} {(\sum\nolimits_{j} {x_{ij} } } )^{2} /n\right] = E\left[\sum\nolimits_{ij} {x_{ij}^{2} } \right]/n + E\left[\sum\nolimits_{i} {\sum\nolimits_{jj^{\prime}} {x_{ij} } } x_{ij^{\prime}} \right]/n = \sum\nolimits_{i} {p_{i} } + (n - 1)\sum\nolimits_{i} {p_{i}^{2} } , $$
(A2)
$$ \sigma_{w(k)}^{2} = [(n - 1)\sum\nolimits_{i} {p_{i} } - (n - 1)\sum\nolimits_{i} {p_{i}^{2} ]/s(n - 1)} = \sum\nolimits_{i} {p_{i} } (1 - p_{i} )/s, $$

or with the introduction of the subscript k for the considered allele

$$ \sigma_{w(k)}^{2} = [(n - 1)\sum\nolimits_{i} {p_{ik} } - (n - 1)\sum\nolimits_{i} {p_{ik}^{2} ]/s(n - 1)} = \sum\nolimits_{i} {p_{ik} } (1 - p_{ik} )/s = H_{S(k)} . $$
(A3)

Expectation E(MSpop)k = E[SSBk/(s − 1)]

$$ E(MS_{pop} )_{k} = E[\sum\nolimits_{i} ( \sum\nolimits_{j} {x_{ij}^{{}} } )^{2} ]/n - (\sum\nolimits_{ij} {x_{ij} )^{2} } /ns]/(s - 1) $$
$$ E[\sum\nolimits_{i} {(\sum\nolimits_{j} {x_{ij} )^{2} } } /n = \sum\nolimits_{i} {p_{i} } + (n - 1)\sum\nolimits_{i} {p_{i}^{2} } \,\,\,{\text{from }}\left( {{\text{A2}}} \right), $$
(A4)

\(E(\sum\nolimits_{ij} {x_{ij} } )^{2} /ns = \left[E(\sum\nolimits_{ij} {x_{ij}^{2} } ) + E(\sum\nolimits_{i} {\sum\nolimits_{jj^{\prime}} {x_{ij} } } x_{ij^{\prime}} ) + E(\sum\nolimits_{ii^{\prime}} {\sum\nolimits_{jj^{\prime}} {x_{ij} } } x_{i^{\prime}j^{\prime}} )\right]/ns\) with \(i^{\prime} \ne i,\)

$$ E(\sum\nolimits_{ij} {x_{ij} } )^{2} /ns = \sum\nolimits_{i} {p{}_{i}} /s + (n - 1)\sum\nolimits_{i} {p_{i}^{2} } /s + n\,\sum\nolimits_{i,i^{\prime} \ne i} {p_{i} } p_{i^{\prime}} /s, $$
(A5)

and then introducing the subscript k for the considered allele:

$$ E(MS_{pop} )_{k} = \sum\nolimits_{i} {p_{ik} } (1 - p_{ik} )/s + \left[n/s(s - 1)\right]\,\,\left[(s - 1)\sum\nolimits_{i} {p_{ik}^{2} } - \sum\nolimits_{i,i^{\prime} \ne i} {p_{ik} } p{}_{i^{\prime}k}\right] $$
(A6)

and as \(D_{ST}^{\prime } = \,\left[(s - 1)\sum\nolimits_{i} {p_{ik}^{2} } - \sum\nolimits_{i,i^{\prime} \ne i} {p_{ik} } p{}_{{i^{\prime } k}}\right]/s(s - 1)\), \(E(MS_{pop} )_{k} = \sigma_{w(k)}^{2} + n\;D_{ST(k)}^{\prime } = H_{S(k)} + n\;D_{ST(k)}^{\prime }\), which corresponds to the result given by Weir (1996) when the sample size of each population is constant (ni = n).

Case of outcrossing plants

Using the indicator variable xijt, indexing with t the alleles at a locus within an individual j from a population i, such that xijt = 1 if a given allele is present and xijt = 0 otherwise, the table for the analysis of variation of this indicator variable, associated to model (9) is given Table 1.

Table 1 Analysis of variance layout for the variable x indicating a given allele in fixed populations, for s populations and n individuals per population

To derive the expected mean squares E(MS) for a fixed set of populations we derive the expectations of four terms: \(E\left[\sum\nolimits_{ijt} {x_{ijt}^{2} } \right]\), \(E\left[2\sum\nolimits_{ij} {x_{ij.}^{2} } \right]\), \(E\left[2n\sum\nolimits_{i} {x_{i..}^{2} } \right]\) et \(E\left[2ns\;x_{...}^{2} \right]\), noting that we have for the indicator variable x:\(E_{ijt} (x_{ijt} ) = E_{ijt} (x_{ijt}^{2} ) = \overline{p} = \sum\nolimits_{i} {p_{i} } /s\), \(E_{jt} (x_{ijt} ) = p_{i}.\)

$${\mathbf{\cdot}} \,E\left[\sum\nolimits_{ijt} {x_{ijt}^{2} }\right] = 2ns\,\overline{p} = 2n\,\sum\nolimits_{i} {p_{i} } . $$
(A7)
$${\mathbf{\cdot}} \,E\left[2\sum\nolimits_{ij} {x_{ij.}^{2} } \right] $$
$$ 2\sum\nolimits_{{ij}} {x_{{ij.}}^{2} } = 2E_{{ij}} \left[ {\sum\nolimits_{t} {x_{{ijt}} } } \right]^{2} {\text{/}}4 = \left[ {E\sum\nolimits_{{ij}} {\left( {\sum\nolimits_{t} {x_{{ijt}}^{2} + \sum\nolimits_{{tt^{\prime } }} {x_{{ijt}} } x_{{ijt^{\prime } }} } } \right)} } \right]/2 = \left[ {n\sum\nolimits_{i} {p_{i} } + n\sum\nolimits_{i} {p_{i} } } \right]_{i}, $$
(A8)

with t \(\ne\) t', because \( E_{{tt^{\prime } }} (x_{{ijt}} x_{{ijt^{\prime } }} ) = P_{i} \), Pi being the frequency of homozygous genotypes for the considered allele in the population i;

$$ {\mathbf{\cdot}}\, 2n\,E\left[ {\sum\nolimits_{i} {x_{{i..}}^{2} } } \right] = E\left[ {\sum\nolimits_{i} {\left[ {\sum\nolimits_{{jt}} {x_{{ijt}} } } \right]} ^{2} } \right]/2n $$

\(E[\sum\nolimits_{jt} {x_{ijt} } ]^{2} = E[\sum\nolimits_{jt} {x_{ijt}^{2} ]} + E[\sum\nolimits_{j} {\sum\nolimits_{tt^{\prime}} {x_{ijt} } } x_{ijt^{\prime}} ] + E[\sum\nolimits_{jj^{\prime}} {\sum\nolimits_{tt^{\prime}} {x_{ijt} } } x_{ij^{\prime}t^{\prime}} ]\), with j \(\ne\) j', t \(\ne\) t’,

\(E\left[\sum\nolimits_{jt} {x_{ijt} } \right]^{2} = 2n\,p_{i}\),

\(E\left[\sum\nolimits_{j} {\sum\nolimits_{kk^{\prime}} {x_{ijt} } } x_{ijt^{\prime}} \right] = 2n\,P_{i}\),

\(E\left[\sum\nolimits_{jj^{\prime}} {\sum\nolimits_{tt^{\prime}} {x_{ijt} } } x_{ij^{\prime}t^{\prime}} \right] = 4n(n - 1)p_{i}^{2}\), and then it results:

$$ 2n\;E\left[\sum\nolimits_{i} {x_{i..}^{2} }\right ] = \sum\nolimits_{i} {p_{i} } + \sum\nolimits_{i} {P_{i} } + 2(n - 1)\sum\nolimits_{i} {p_{i}^{2} } . $$
(A9)
$$ \begin{aligned} {\mathbf{\cdot}} \,2ns\;E\left[ {x_{{...}}^{2} } \right] & = \left[ {\sum\nolimits_{{ijt}} {x_{{ijt}} } } \right]^{2} /2ns \\ E\left[ {\sum\nolimits_{{ijt}} {x_{{ijt}} ^{2} } } \right] & = E\left[ {\sum\nolimits_{{ijt}} {x_{{ijt}} ^{2} } } \right] + E\left[ {\sum\nolimits_{{ij}} {\sum\nolimits_{{tt^{\prime } }} {x_{{ijt}} } } x_{{ijt^{\prime } }} } \right] + E\left[ {\sum\nolimits_{i} {\sum\nolimits_{{jj^{\prime } tt^{\prime } }} {x_{{ijt}} } } x_{{ij^{\prime } t^{\prime } }} } \right] + E\left[ {\sum\nolimits_{{ii^{\prime } }} {\sum\nolimits_{{jj^{\prime } tt^{\prime } }} {x_{{ijt}} } } x_{{i^{\prime } j^{\prime } t^{\prime } }} } \right] \\ \end{aligned} $$

with i \(\ne\) i', j \(\ne\) j', t \(\ne\) t';

\(E\left[\sum\nolimits_{ijt} {x_{ijt}^{2} } \right] = 2n\,\sum\nolimits_{i} {p{}_{i}}\),

\(E\left[\sum\nolimits_{ij} {\sum\nolimits_{tt^{\prime}} {x_{ijt} } } x_{ijt^{\prime}} \right] = 2n\,\sum\nolimits_{i} {P_{i} }\),

\(E\left[\sum\nolimits_{i} {\sum\nolimits_{jj^{\prime}tt^{\prime}} {x_{ijt} } } x_{ij^{\prime}t^{\prime}} \right] = 4sn(n - 1)E_{i} \left[E_{jt} (x_{ijt} )\,E_{j^{\prime}t^{\prime}} (x_{ij^{\prime}t^{\prime}} )\right] = 4n(n - 1)\sum\nolimits_{i} {p_{i}^{2} } E\left[\sum\nolimits_{ii^{\prime}} {\sum\nolimits_{jj^{\prime}tt^{\prime}} {x_{ijt} } } x_{i^{\prime}j^{\prime}t^{\prime}} \right] = 4n^{2} \sum\nolimits_{ii^{\prime}} {E_{jt} (x_{ijt} )E_{j^{\prime}t^{\prime}} (x_{i^{\prime}j^{\prime}t^{\prime}} )} = 4n^{2} \sum\nolimits_{ii^{\prime}} {p_{i} } p_{i^{\prime}}\), then, it results:

$$ E\left[\sum\nolimits_{ijt} {x_{ijt}^{{}} }\right]^{2} /2ns = \sum\nolimits_{i} {p_{i} } /s + \sum\nolimits_{i} {P_{i} } /s + 2(n - 1)\;\sum\nolimits_{i} {p_{i}^{2} } /s + 2n\sum\nolimits_{ii^{\prime}} {p_{i} } p_{i^{\prime}} /s, $$
(A10)

Adding the subscript k to pi and Pi for the frequency of allele k and the frequency of the homozygous genotype for this allele in population i, we get for the expected mean square E(MS) = E(SS)/df, in terms of pik and Pik:

$$ E\left( {MS_{pop} } \right)_{k} = \sum\nolimits_{i} {p_{ik} } /s + \sum\nolimits_{i} {P_{ik} } /s + 2(n - 1)\sum\nolimits_{i} {p_{ik}^{2} /s} - \left[2n/s(s - 1)\right]\;\sum\nolimits_{ii^{\prime}} {p_{ik} } p_{i^{\prime}k} $$
(A11)
$$ E\left( {MS_{ind/pop} } \right)_{k} = \sum\nolimits_{i} {p_{ik} } /s + \sum\nolimits_{i} {P_{ik} } /s - 2\sum\nolimits_{i} {p_{ik}^{2} } /s, $$
(A12)
$$ E\left( {MS_{all/ind} } \right)_{k} = \sum\nolimits_{i} {p_{ik} } /s - \sum\nolimits_{i} {P_{ik} } /s, $$
(A13)
$$ E\left( {MS_{Tot} } \right)_{k} = \sum\nolimits_{i} {p_{ik} } /s - \left[\sum\nolimits_{i} {P_{ik} } /s + 2(n - 1)\,\sum\nolimits_{i} {p_{ik}^{2} /s} + 2n\;\sum\nolimits_{ii^{\prime}} {p_{ik} } p_{i^{\prime}k} /s\right]/\left( {{2}sn - {1}} \right). $$
(A14)

Summing over all alleles from (A13) we can write

$$ \sum\nolimits_{k} {E(MS_{all/ind} )_{k} = \,} \left[\sum\nolimits_{ik} {p_{ik} } (1 - p_{ik} ) + \sum\nolimits_{ik} {p_{ik}^{2} - \sum\nolimits_{ik} {P_{ik} } } \right]/s, $$

then with \(\sum\nolimits_{ik} {p_{ik} } (1 - p_{ik} )/s = H_{S}\), \(\sum\nolimits_{ik} {p_{ik}^{2} } /s = 1 - H_{S}\) and \(\sum\nolimits_{ik} {P_{ik}^{{}} } /s = 1 - H_{0}\), \(H_{0}\) being the average frequency of heterozygotes across populations, it results

$$ \sum\nolimits_{k} {E(MS_{all/ind} )_{k} = } \,H_{0} . $$
(A15)

Noting that \(\sum\nolimits_{ik} {P_{ik} /s - \sum\nolimits_{ik} {p_{ik}^{2} /s} = } \,H{}_{S} - H_{0} = \overline{\Delta }\) the global departure from random mating equilibrium, from (A12) we can write

$$ \sum\limits_{k} {E(MS_{ind/pop} } )_{k} = H_{0} + 2\,\left(\sum\limits_{ik} {P_{ik} } - \sum\limits_{ik} {p_{ik}^{2} } \right) = H_{0} + 2\;(H{}_{S} - H_{0} ). $$
(A16)

Finally, by noting that \(D_{ST}^{^{\prime}} = [(s - 1)\sum\nolimits_{ik} {p_{ik}^{2} } - \sum\nolimits_{ii^{\prime}k} {p_{ik} p_{i^{\prime}k} ]/\;} s(s - 1)\), from (A11) we can write in terms of Nei’s parameters the sum over all alleles of the between-population expected mean squares:

$$ \sum\nolimits_{k} {E(MS_{pop} )_{k} } = H_{0} + 2\,(H_{S} - H_{0} ) + 2n\;D_{ST}^{^{\prime}} . $$
(A17)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gallais, A., Lefèvre, F. ANOVA for estimating Nei’s diversity and related parameters in a fixed set of populations with an application in genetic resources conservation. Euphytica 217, 192 (2021). https://doi.org/10.1007/s10681-021-02904-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10681-021-02904-x

Keywords

Navigation